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GENEALOGIES OF TWO LINKED NEUTRAL LOCI AFTER A SELECTIVE 
SWEEP IN A LARGE POPULATION OF STOCHASTICALLY VARYING SIZE 

REBEKKA BRINK-SPALINK AND CHARLINE SMADI 


Abstract. We study the impact of a hard selective sweep on the genealogy of partially linked neutral loci 
in the vicinity of the positively selected allele. We consider a sexual population of stochastically varying 
size and, focusing on two neighboring loci, derive an approximate formula for the neutral genealogy 
of a sample of individuals taken at the end of the sweep. Individuals are characterized by ecological 
parameters depending on their genetic type and governing their growth rate and interactions with other 
individuals (competition). As a consequence, the ’’fitness” of an individual depends on the population 
state and is not an intrinsic characteristic of individuals. We provide a deep insight into the dynamics 
of the mutant and wild type populations during the different stages of a selective sweep. 


Introduction 

We study the hitchhiking effect of a beneficial mutation in a sexual haploid population of stochastically 
varying size. We assume that a mutation occurs in one individual of a monomorphic population and that 
individuals carrying the new allele a are better adapted to the current environment and spread in the 
population. We suppose that the mutant allele a eventually replaces the resident one, A, and study the 
influence of this fixation on the neutral gene genealogy of a sample taken at the end of the selective sweep. 
That is, in each sampled individual we consider the same set of partially linked loci including the locus 
where the advantageous mutation occurred. We then trace back the ancestral lineages of all loci in the 
sample until the beginning of the sweep and update the genetic relationships whenever a coalescence or a 
recombination changes the ancestry of one or several loci. Our main result is the derivation of a sampling 
formula for the ancestral partition of two neutral loci situated in the vicinity of the selected allele. 

The first studies of hitchhiking, initiated by Maynard Smith and Haigh [50] , have modeled the mutant 
population size as the solution of a deterministic logistic equation [m [n m HU. Barton [T] was the 
first to point out the importance of the stochasticity of the mutant population size. Following this paper, 
a series of works took into account this randomness during the sweep. In 13 HB] Schweinsberg and 
Durrett based their analysis on a Moran model with selection and recombination, while Etheridge and 
coauthors [13 worked with the diffusion limit of such discrete population models. Then Brink-Spalink 0, 
Pfaffelhuber and Studeny and Leocard m extended the respective findings of these two approaches 
for the ancestry of one neutral locus to the two-locus (resp. multiplc-locus) case. 

However, in all these models, the population size was constant and each individual had a “fitness” 
only dependent on its type and not on the population state. The fundamental idea of Darwin is that 
the individual traits have an influence on the interactions between individuals, which in turn generate 
selection on the different traits. In this paper we aim at modeling precisely these interactions by extending 
the model introduced in m where the author considered only one neutral locus. Such an eco-evolutionary 
approach has been introduced by Metz and coauthors m and has been made rigorous in the seminal paper 
of Fournier and Meleard [12]. Then it was further developed by Champagnat, Meleard and coauthors 
(see [ 3011 ] and references therein) for the haploid asexual case and by Collet, Meleard and Metz [3 and 
Coron and coauthors [7] for the diploid sexual case. 
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The population dynamics, described in Section [TJ is a multitype birth and death Markov process with 
competition. We represent the carrying capacity of the underlying environment by a scaling parameter 

G IN and state results in the limit for large K. In [5] it was shown that such kind of invasion 
processes can be divided into three phases (see Figure [2]): an initial phase in which the fraction of a- 
individuals does not exceed a fixed value e > 0 and where the dynamics of the wild type population is 
nearly undisturbed by the invading type. A second phase where both types account for a non-negligible 
percentage of the population and where the dynamics of the population can be well approximated by a 
deterministic competitive Lotka-Volterra system. And finally a third phase where the roles of the types 
are interchanged and the wild type population is near extinction. The durations of the first and third 
phases of the selective sweep are of order log K whereas the second phase only lasts an amount of time 
of order 1. This three phases decomposition is commonly encountered in population genetics models and 
dates back to [16]. 

In Section [3] we precisely describe these three phases and introduce two couplings of the population 
process, key tools to study the dynamics of the A- and a-populations. Section Sj is devoted to the proofs 
of the main theorems on the ancestral partition of the two neutral alleles. Sections [S] to [7| are dedicated to 
the proofs of auxiliary statements. In Section [3| we compare our findings with previous results. Finally, 
we state technical results needed in the proofs in the Appendix. 


1. Model and results 

We consider a three locus model: one locus under selection, SL, with alleles in A := {A, a} and two 
neighboring neutral loci 7V1 and N2 with alleles in the finite sets B and C respectively. We denote by 
E = A X B X C the type space. Two geometric alignments are possible: either the two neutral loci are 
adjacent (geometry SL — Nl — N2), or they are separated by the selected locus (geometry N1 — SL — N2). 
We introduce the model and notations for the adjacent geometry, their analogs for the separated one can 
be deduced in a straightforward manner. 

Whenever a reproduction event takes place, recombinations between SL and Nl or between A"1 and 
N2 occur independently with probabilities ri and r 2 , respectively. These probabilities depend on the 
parameter A", representing the environment’s carrying capacity, but for the purpose of readability we do 
not indicate this dependence. We assume a regime of weak recombination: 

(1.1) limsup r-jlogAT < oo, j = 1,2. 

K—^oo 

This is motivated by Theorem 2 in [13] which states that this is the good scale to observe a signature 
on the neutral allele distribution. If the recombination probabilities are larger (neutral loci more distant 
from the selected locus), there are many recombinations and the sweep does not modify the neutral 
diversity at these sites. Recombinations may lead to a mixing of the parental genetic material in the 
newborn, and hence, parents with types a/Sy and a'/J'y' in £ can generate the following offspring: 


possible genotype 
a/Sy, a'/?'y' 
aj3'^', a'f3"f 
a/3y', a'/3'y 
a/3'y, a'Pj' 


event 

no recombination 

one recombination between SL and iVl 
one recombination between iVl and N2 
two recombinations 


probability 
(l-ri)(l-r2) 
Ti(l - T2) 

(1 - ri)r2 
rir2 


We will see in the sequel that the probability to witness a birth event with two simultaneous recombina¬ 
tions in the neutral genealogy of a uniformly chosen individual is very small. 

As we assume the loci iVl and N2 to be neutral, the ecological parameters of an individual only 
depend on the allele a at the locus under selection. Let us denote by fa the fertility of an individual 
with type a. In the spirit of [5], such an individual gives birth at rate fa (female role), and has a 
probability proportional to fa to be chosen as the father in a given birth event (male role). Denoting the 
complementary type of the allele a by d we get the following result for the birth rate of individuals of 
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type a/37 ^ 


(1.2) b^g Jn) = (1 - ri)(l - r2)fanai3j + ri(l - r 2 )fa 


U “1“ fa'^ajS'y 


(l-i'i 1'2/a -7- —7 -+ rir2/< 

Jana + JAUA 


fana + fAnA 

^(/3',7')G(S.C) naP'-fifanaPj' + 
fana + fAnA 


where Uajs-f (resp. Ua) denotes the current number of a/ 37 -individuals (resp. a-individuals) and n = 
(riap-y, (a, /3,j) G £) is the current state of the population. An a-individual can die either from a 
natural death (rate Da), or from type-dependent competition: the parameter Ca,a' models the impact 
an individual of type a' has on an individual of type a, where (a, a') € The strength of the 

competition also depends on the carrying capacity K. This results in the total death rate of individuals 
carrying the alleles a/37 ^ 


(1.3) 


^a /37 (^) 



K 


riA + 


-'oc,a 

1^ 



'^ajS'y ■ 


Hence the population process 

0) = ((iVf^^(t))(a./3.^)ef7 > 0 ), 

where denotes the number of a/ 37 -individuals at time t, is a multitype birth and death process 

with rates given in (ESI) and m- We will often work with the trait population process {{N^(t), (t)),t > 

0), where Na{t) denotes the number of a-individuals at time t. This is also a birth and death process 
with birth and death rates given by: 


(1.4) (n) = b^p^{n) = faUa 

(I 3 ,j)gBxC 

dain)= Y = {Da + ^^nA + ^^na^Ua- 

(I 3 ,j)gBxC 

As a quantity summarizing the advantage or disadvantage a mutant with allele type a has in an a- 
population at equilibrium, we introduce the so-called invasion fitness Sas through 


(1.5) 


Saa ■— fa Da Ca^ 


where the equilibrium density na is defined by 

( 1 . 6 ) Ua 


fa-Do 

Da.a 


The role of the invasion fitness Saa and the definition of the equilibrium density Ua follow from the 
properties of the two-dimensional competitive Lotka-Volterra system: 

(1.7) n^''= ifa - Da - Ca,An^A - Da,an‘^a'>)nD\ -2 G 1R+, n^^)(0) = z„, aGA. 


If we assume 

( 1 . 8 ) 


UA >0, na> 0, and Sao < 0 < SaA, 


then Ua is the equilibrium size of a monomorphic a-population and the system (ds has a unique stable 
equilibrium (0, fia) and two unstable steady states {nA,0) and (0,0). Thanks to Theorem 2.1 p. 456 
in m we can prove that if N^{0) and (0) are of order K and K is large, the rescaled process 
{N^/K, Nff /K) is very close to the solution of (dH) during any finite time interval. The invasion fitness 
SaA corresponds to the per capita initial growth rate of the mutant a when it appears in a monomorphic 
population of individuals A at their equilibrium size haK. Hence the dynamics of the allele a is very 
dependent on the properties of the system (dZl) and it is proven in [3] that under Condition (O one 
mutant a has a positive probability to fix in the population and replace a wild type A. More precisely, if 
we use the convention 


(1.9) 


PW(.) := P(.|3Vi'(0) = lnAK\,N^{0) = 1), 
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Equation (39) in [3] states that 

( 1 . 10 ) 


lim p(*')(Fix-^) = %^ =: 

K-^oo fa 


where s is called the rescaled invasion fitness, and the extinction time of the ^-population and the event 
of fixation of the a-allele are rigorously defined as follows: 

(1.11) Ti^,:=mi{t>0:N^it)=0} and Fix^ := < oo, iVf (T,^J > O}. 


From this point onward, we fix d in N. We aim at quantifying the effect of the selective sweep on 
the neutral diversity. Our method consists in tracing back the neutral genealogies of d individuals 
sampled uniformly at the end of the sweep (time until time 0. Two event types (see Definition 

mn) may affect the relationships of the sampled neutral alleles: coalescences correspond to the merging 
of the neutral genealogies of two individuals at one or two neutral loci, and recombinations redistribute 
the selected and neutral alleles of one individual into two groups carried by its two parents. We will 
represent the neutral genealogies by a partition 0^ which belongs to the set of marked partitions 
of {{i,k),i € {!,..., d},fc € {1,2}} with (at most) one block distinguished by the mark *, which will 
correspond to the descendants of the original mutant a. In this notation (i, 1) and (i, 2) are the neutral 
alleles at loci 7V1 and A^2 of the ith sampled individual. Let us define rigorously the random partition 
0 ?: 


Definition 1.1. Sample d individuals uniformly and without replacement at the end of the sweep (time 
T^t)- Follow the genealogies of the first and second neutral alleles of the i-th sampled individual, (i, 1) 
and {i, 2) for i G {1,..., d}. Then the partition € Vfi is defined as follows: each block of the partition 
0f is composed of all those neutral alleles which originate from the same individual alive at the beginning 
of the sweep; the block containing the descendants of the mutant a (if such a block exists) is distinguished 
by the mark *. 

We will show in Theorems [T] and [5] that when K is large the partition 0^ belongs with a probability 
close to one to a subset of V^, which is defined as follows: 


Definition 1.2. is the subset of consisting of those partitions whose unmarked blocks (if there 
are any) are either singletons or pairs of the form {(i, 1), (i,2)} for one i G {1, ...,d}. 

Example 1. In the example represented in Figure[Tl the marked partition belongs to A^: 

t(-) = {((1,1), (1, 2), (2,1), (5, 2)}*, {(2, 2)}, 1(3,1), (3, 2)}, {(4,1)}, {(4, 2)}, {(5,1)}}. 

For a partition tt G we define for some possible ancestral relationships the number of individuals 
in the sample whose two neutral loci are related in that particular way: 


Definition 1.3. Let d € N and n gV^. Then we set: 

|7r|i = ff{l < i < d such that {i, 1) and {i, 2) belong to the marked block } 

|7r|2 = ^ i < d such that (i, 1) belongs to the marked block and {(i, 2)} is an unmarked block} 

|7r|3 = if{l <i <d such that (i,2) belongs to the marked block and {(i, 1)} is an unmarked block} 

|7r|4 = ff{l < i < d such that {(i, 1), {i, 2)} is an unmarked block} 

IttIs = ff{l <i<d such that {(i, 1)} and {(i,2)} are two distinct unmarked blocks} 


To express the limit distribution of the partition Q¥ we need to introduce: 




fA/fa 


- <7192) 


(1.12) 9i := e , 92 := e , 92 := e' - 

n +r2(l - fA/fa) 

where the invasion fitnesses have been defined in (jl.SD . We did not make any assumption on the sign of 
fa{ri + T 2 ) — /aT 2 , but qs can be written in the form S(e~'^ — — p) for (d, fx, v) G Ri} so that it is 

well defined and non-negative. It is easy to check that 93 < 1. The forms of qi, q 2 and 92 are intuitive (see 
comments of Proposition[T|).The form of (73 is more complex to explain and results from a combination of 
different possible genealogical scenarios during the first phase. We now define five non-negative numbers 
{Pk, 1 A ^ A 5) which will quantify the law of 0^ for large K in Theorem [TJ 

(1.13) Pi := gig2[l - (1 - 9i)(l - 92)], P2 := 9i[(l - 9192) - 9292(1 - 9i)], 

P3 := 9192(1 - 92)(1 - 9i), P4 := 9293 and Ps := (1 - 9i)(l - 9i92(1 - 92 )) - 9293- 
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Figure 1 . Example of genealogy for a 5-sample: dark blue neutral alleles originate from 
the mutant and light blue ones from an A-individual. We indicate the selected allele, 
A or a, associated with the neutral alleles during the sweep. It can change when a 
recombination occurs. Bold lines represent the A(green)- and a(red)-population sizes. 
In this example, the two neutral alleles of the first individual, the first neutral allele 
of the second individual and the second neutral allele of the fifth individual originate 
from the mutant; the two neutral alleles of the third individual originate from the same 
^-individual, whereas the two neutral alleles of the fourth individual originate from two 
distinct ^-individuals. 


Note that X]i<fc< 5 Pfe = 1- Finally, we introduce an assumption which summarizes all the assumptions 
made in this work: 


Assumption 1. (N^(0), N^(0)) = ([uyiiFj,!) and Conditions (11.11) on the recombination probability 
and dm on the equilibrium densities and fitnesses hold. 

With Definitions inifo and o in mind, we can now state our main results: 


Theorem 1 (Geometry SL — N1 — N2). Under Assumption[Jl we have for every n G 


lim 

K—^oo 


= TTlFix^) - l{.eA4Pl'"'^P2'"'=P3'"'^P4'"'>5 




= 0 . 


Notice that when K is large, belongs to with a probability close to one, and that 

= G Ad) 

is a probability on Ad (depending on K). Moreover, this result implies that the d sampled individuals 
have asymptotically independent neutral genealogies. With high probability, the neutral alleles of a given 
sampled individual i either originate from the first mutant a and belong to the marked block, or escape 
the sweep and originate from an A individual. In this case they belong to an unmarked block which is of 
the form {(i, 1)}, {(i, 2)} or {(i, 1), {i, 2)}, according to Definition 11.31 As a consequence, if some neutral 
alleles of two distinct sampled individuals escape the sweep, they originate from distinct A-individuals 
with high probability. However, the genealogies of the two neutral alleles of a given individual are not 
independent. For example the probability that (i, 1) and (i, 2) escape the sweep is P4+P5; the probability 
that (i, 1) (resp. (*,2)) escapes the sweep is ps +p 4 +p 5 (resp. p 2 + P 4 +P 5 ), and for every A G INI such 
that ri ^ 0 

{P 3 + P 4 + P 5 ){P 2 +P4+P5) = (1 - gi)(l - 9192) < (1 - 9 l)(l - 9 l 92 + 919292) =P 4 +P 5 - 

This is due to the fact that if (backwards in time) a recombination first occurs between SL and Nl, the 
neutral allele at N2, linked to Nl, also escapes the sweep. As the term 919292 does not tend to 0 when 
K goes to infinity under Condition the only possibility to have an equality in the limit is the case 

where ri log A <C 1 or in other words when the probability to see a recombination between SL and Nl 
is negligible. 


Let us now consider the separated geometry, Nl — SL — N2: 
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Theorem 2 (Geometry — SL — N2). Under Assumption\^ we have for every ir G 


lim 

K—^oo 


pW(0f = TrlFix-^) - l{^eA^}[qiq2r'^[qi{l - g2)]'’''"[(l - qi)q2r'^[{l - gi)(l - 92 )] 


ikls 


= 0 . 


Again the neutral genealogies of the d sampled individuals are asymptotically independent. Further¬ 
more, we have independence between the neutral loci. Indeed Theorem [2] means that a neutral allele at 
locus Nk escapes the sweep with probability 1 — qk independently of all other neutral alleles, including 
the allele at the other neutral locus of the same individual. This is due to the fact that in the separated 
geometry a recombination between SL and one neutral locus has no impact on the genetic background 
of the allele at the other neutral locus. Note in particular that there is no block of the form {(*, 1), {i, 2)} 
in the limit partition, as the two neutral alleles have a very small probability to recombine at the same 
time. 


2. Comparison with previous work 

In |18) the authors gave an approximate sampling formula for the genealogy of one neutral locus dur¬ 
ing a selective sweep. The population evolved as a two-locus modified Moran model with recombination, 
selection, and in particular constant population size. They introduced the fitness of the mutant a as 
follows: when one of the iid exponential clocks of the living individuals rings, one picks two individuals 
uniformly at random (with replacement), one dies, and the other one gives birth. A replacement of an 
a-individual by an A-individual is rejected with probability . In this case, nothing happens. In m, 
the author studied the one neutral locus version of the here presented model. It was shown that the 
ancestral relationships in a sample taken at the end of the sweep correspond to the ones derived in |18j 
when we equal the fitness of m and the rescaled invasion fitness = SaA/fa and when we have the 
equality \SAa\/fA = SaA/fa (in this case the first and third phases have the same duration, SaA ^ogK/ fa). 

In [2], the author generalized the model introduced in [TB] towards two neutral loci and used similar 
methods to derive a corresponding statement for the genealogy of a sample taken at the end of the sweep. 
If we however make the analogous comparison and try to match our result for the adjacent geometry 
with the statement from [5], we observe an interesting phenomenon: the probabilities of the different 
types of ancestry only coincide if the birth rates of a- and A-individuals are the same, that is, if fa = f A- 
In biology, the fitness describes the ability to both survive and reproduce, and can be defined by the 
average contribution of an individual with a given genotype to the gene pool of the next generation. 
Hence a mutation which affects the fitness of an individual in a given environment can either act on 
the fertility (/« in our model), or on the death rate, intrinsic [Da) or by competition {Ca,a')i or on 
both. Our result is comparable to that of [18] if the mutation only affects the death rate (and still if 
= SaA/fa = \SAa\lfA). 

In [T7], instead of a birth and death process, the authors modeled the population with a structured 
coalescent. It is shown that this process can be approximated by a marked Yule tree where the different 
marks are realized by Poisson processes and indicate a recombination of one or two loci into the wild 
type background. The impact of the third phase is taken into account by a certain refinement prior to 
the beginning of the coalescent which leads to the same effect of splitting of the two neutral loci as it 
is seen here. We again find similarities with our results when /a = /a- In contrast, the techniques and 
precision used in yield that coalescent events with A-individuals cannot be ignored, that is, there are 
neutral loci of different individuals from the sample which have the same type-A-ancestor. The structure 
of the sample is therefore different from our results here. Notice that it is also the case of the second 
approximate sampling formula stated in [18] . which is more precise than the first one. 

3. Dynamics of the sweep and couplings 

3.1. Description of the three phases. We only need to focus on the trajectories of the population 
process where the mutant allele a goes to fixation and replaces the resident allele A. Champagnat has 
described these trajectories in |3] and in particular divided the sweep into three phases with distinct A- 
and a-population dynamics (see Figure [5]). In the sequel, e will be a positive real number independent 
of K, as small as needed for the different approximations to hold. Moreover, from this point onward we 
will write Na (resp. Najs-y) instead of (resp. and P instead of P^*”^ for the sake of readability. 
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Figure 2. The three phases of a selective sweep The y-axis corresponds to population 
sizes {A in black, a in red), and the x-axis to the time. In this simulation, K = 1000, 
(/a, fa) = (2, 3), Da = 0.5, a € A, Ca,a' = 1, (a, a') G Af. We have also indicated some 
of the notations introduced in Section o 


First phase. The resident population size stays close to its equilibrium value uaK as long as the mutant 
population size has not hit [eK]: if we introduce the finite subset of N 


(3.1) 


If := 


K (nA — 


Ca 


Ca,a 




tia 2 ^ 


Ca, 
Ca,A‘'^ 


n N, 


and the stopping times and , which denote respectively the hitting time of \_eK\ by the mutant 
population size and the exit time of by the resident population size. 


(3.2) 


rf := inf{t > 0, Na{t) = [eK\} and := inf{t > 0, NA{,t) if } 


K 


tK, 


then we can deduce from [3] (see Equations (A.5) and (A.6) in [19] for the details of the derivation) that 
the events Fix^, {T^ < Sf} and < oo} are very close: 

(3.3) limsupP(^)({Tf < S'f} AFix*") < c£, and limsupP(^)({^e^ < oo} A Fix^^) < ce, 

K—¥oo K—^co 

for a finite c and e small enough, where we recall convention (11.91) . In this context, A is the symmetric 
difference: for two sets B and C, B A C = {B (1 (7°) U (C D B‘^). From this point onwards, ’’first phase” 
will denote the time interval [0, T^] when the a-population size is smaller than [sATJ. 

Second phase. When Na and Na are of order K, the rescaled population process {Na/K, Na/K) is 
well approximated by the Lotka-Volterra system (ira. Moreover, under Condition dLH) the system dni) 
has a unique attracting equilibrium (0, no) for initial conditions z satisfying Zo > 0, where Ua has been 
defined in (11.61) . In particular, if we introduce for (nA,na) G the notation, 

(3.4) P(n^.nj(.) := P(.|iVA(0) = TIA, NaiO) = Ua), 
then Theorem 3 (b) in |3] implies: 

(3.5) lim sup sup - n^a\t) > <5) = 0, 

for every (5 > 0, where 

(3.6) r := {z G R^, YzaK\ G /f ,Zo G [e/2,e]}. 


(3.7) 


A(z) := inf {s > 0,Vf > G [0,e^/2],n}"’)(t) G [ua - e/2, Uo + e/2]}, 

tg := sup{A(z),z G r} < oo. 


In the sequel, ’’second phase” will denote the time interval \T^+ te] when the population process 
is close to the solution of the system dlB. 
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Third phase. Equation (13.511 also implies that 

(3.8) lim pfiYd&M g 

A->oo V ^ 


K 


< e, 


f Na{T^^) 

\ K ^ K 


l)cr) 


= 1 , 


2a;i := inf z G r} > 0, and W 2 := 2sup z G E} < e^. 


where 
(3.9) 

The ’’third phase”, which corresponds to the time interval [T^ +tg,T^^], can be seen as the symmetric 
counterpart of the first phase, where the roles of A and a are interchanged: during the extinction of the 
A-population, the a-population size stays close to its equilibrium value fiaK. 

Let us introduce the positive real number M" := 3 + {fa + Ca,A)ICa,a and the finite subset of N 


(3.10) 


:= 


K{na - M"£^ ,K{na + M"e) 


(~1 N. 


The times and are two stopping times for the process restarted after the second phase and 

denote respectively the hitting times of [uiLj by the A-population for it G R+, and the exit time of 
by the a-population during the third phase, 

(3.11) := inf{t > 0, Na{T^ + te+t) = \uK \}, := inf{t > 0, Na{T^ +t,+t)i jf}. 


If we define the event 
(3.12) 


ATf := {Tf < } n e [a;i,u;2]. 


Na{Tf^+t,) 

K 


<e 


we get from the proof of Lemma 3 in [3] that for a hnite c and £ small enough, 
(3.13) 

limsup IP(Fix^ A [Aff n A 5'^’“)}]) -h P(Fix-^ A [ATf n 

K—^oo ^ 


< C£. 


To summarize, the fixation event Fix^ is very close to the following succession of events: 

• The a-population size hits {eK] before the A-population size has escaped the vicinity of its 
equilibrium (first phase) 

• The rescaled population process N/K is close to the deterministic competitive Lotka-Volterra 
system during the second phase 

• The A-population size gets extinct before hitting [£/LJ and before the a-population size has 
escaped the vicinity of its equilibrium (third phase) 


3.2. Couplings for the first and third phases. We are interested in the law of the neutral genealogies 
on the event Fix^. Equations (13.31) and (13.131) imply that it is enough to concentrate our attention on 
the event (~) but the dynamics of the population process N conditionally on this 

event is complex to study. Indeed it boils down to studying the dynamics of a process conditioned on a 
future event {{T^ < Sf} for the first phase and for the third one). Hence the idea 

is to couple the population process with two processes, N and N, whose laws are easier to study. These 
processes will satisfy: 

(3.14) limsup P{{3t < T^,N{t) ^ N{t)},T^ < 00 ) < c£. 

K—^oo 

and 

(3.15) limsup P({3 0 < t - {T^ + A) < T^^’^\N{t) ^ N{t)},T^^’^'> < < 00 ) < c£. 

K^oo 

Let a be in A and n be in N^. Denote the a component of the population state: 

(3.16) Tl^ ^ ^ ^ 

(/3,7)eBxC 

where (ca/j^, {a,(3,j) G £) is the canonical basis of We are now able to introduce a process needed 
to describe the couplings: 

Definition 3.1. We denote by Moran process of type a with recombination r 2 a process ^ with 

values in initial state n^°‘\ and the following dynamics: 
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• After an exponential time with parameter failaK we pick uniformly and with replacement three 
individuals and draw a Bernoulli variable R with parameter r 2 

• The first individual dies, the second one gives birth to an individual carrying its alleles at loci SL 
and Nl, the third one is the potential second parent 

• If R = 0, there is no recombination and the allele at locus N2 of the newborn is also inherited 
from the second individual; if R = 1 there is a recombination between Nl and N2 and the newborn 
inherits its second neutral allele from the third individual 

• We again draw an exponential variable with parameter faXiaK and restart the procedure 

Coupling with N: N and N are equal up to time after this time the A individuals in the population 
process N follow a Moran process with recombination independent of the a-individuals. Let ^ be in 
B X C. We let the a-population evolve as if the a-individuals were interacting with Na{s) individuals 
with genotype 

(3.17) N{t) = l,^sfN{t) + h>sf{MR^f^^"^\t-S^)+ ^ 

(/3,7)GBxC 

where Mr!'^ ^ has been defined in Definition [Q and {Q^p^,i G {1,2}, (/3,j) G B x C) are independent 

Poisson Point processes with density dsdO, also independent of MR^^'' . The reason for the construction 
of such a coupling is that we need to control the A-population size and the number of births of A- 
individuals during the first phase in Section [5] With the process N such control is achieved easier. 
Coupling with TV: we assume that from (13.121) holds; N and N are equal up to time -|- 
tg -|- A Then the a-individuals in the population process N follow a Moran process with 

recombination independent of the A-individuals, and each A/Jy-population evolves as a birth and death 
process with individual birth and death rates /a and /a -f I^aqI, independent of the a-individuals and 
the A/T'y'-populations with (/9,7) ^ (/3',7'): 



(3.18) i^(Tf + t, + t) = l^^g(K,.)iV(Tf + t,+t) + 

+ + t 


where MR^ ^ has been defined in Definition 13.11 and is independent of the sequence of independent 
Poisson measures {Qp-y, (/3,7) G B x C), with intensity dsdO. The a-population size and the number of 

births of a-individuals will be easy to control for the process N during the third phase, and again we will 
need such control in Section [51 


Qpy{ds, d0)|] 


{0<9<fA^AM{>>-)} '^{0<e-fA^AfiA^-)<UA + \SAa\)^AMi^-)} . 


^APy 

iP,y)eBxC 




Inequality (13.141) follows from (13.31) . Moreover, from the proof of Lemma 3 in [5] we know that 

liminf P(ro^^’^^ < a ) > 1 - ce 

K^oo 

for a finite c and e small enough. Adding (13.131) we get that (13.151) is also satisfied. Hence we will study 

the processes N and N and deduce properties of the dynamics of the process N during the first and third 
phases. 


4. Proofs of the main results 

As the proof of Theorem [5] is simpler than this of Theorem [1] and follows essentially the same ideas, 
we only prove Theorem [T] 

4.1. Events impacting the genealogies in each phase. Let us now summarize the results on the 
genealogies for the three successive phases of the sweep that we will derive in Sections [5] and [71 
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First phase: As explained in the previous section, we work with the process N to study the first phase. 
Let us introduce the jump times of N: 

(4.1) T-|f=0 and =mi{t> ^ m>l. 

The number of jumps during the first phase is denoted by 

(4.2) J^(l) := inf{m G N,iV„(r^) = [eK\}. 

Coalescence and recombination events are defined as follows (see Figure [3]): 

Definition 4.1. Sample two distinct individuals at time and denote a/d'y and a'their type. 

We say that [3 and fd' coalesce at time if they are carried by two distinct individuals at time and 
by the same individual at time Seen forwards in time it corresponds to a birth and hence a copy 

of the neutral allele. Seen backwards in time it corresponds to the fusion of two neutral alleles into one, 
carried by one parent of the newborn. We define in the same way coalescent events at locus N2 (resp. 
loci N1 and N2) for alleles 7 and 7 ^ (resp. allele pairings (/3,7) and [fd' 

We say that fd (and/or 'y) recombines at time from the a- to the a'-population if the individual 
carrying the allele / (and/or 'y) at timerf/ is a newborn, carries the allele a inherited from it first parent, 
and has inherited its allele jd (and/or j) from a different individual carrying allele a'. 

We are only interested in recombinations which entail new associations of alleles. In particular we will 
not consider the simultaneous recombinations of a pair {jd, 7 ) within the a-population. 


- A - - - - A - — . - - - -1 2 

^33-O--0-O- j 


-H - -:k 


Figure 3. Illustration of Definition HiTI the newborn (individual k) has inherited the 
selected allele from its ’’white” parent and the two neutral alleles from its ’’blue” parent; 
hence the encircled neutral loci (of individuals i and k) coalesce at time t(/. In terms 
of recombinations, the two neutral loci of the newborn individual recombine at time 
from the a- to the A-population 


Let us now describe the genealogical scenarios which modify the ancestral relationships between the 
neutral alleles of one individual and occur with positive probability when K is large. We first focus on 
the first phase and pick uniformly an individual i from the a-population at time . We introduce: 

NR{i/^'^ : there is no recombination into the A-population affecting {i, 1) or {i, 2) 
and both neutral loci of the i-individual originate from the first mutant, 


R2{i/^'> : 


R12{i/^'> : 

[ 2 , 1]Z ■■ 
[ 12 , 2]”/,0 


only the neutral allele (i, 2) is affected by a recombination with the A-population, 
hence (i, 1) originates from the first mutant and (i, 2 ) from an A-individual, 
one recombination between SL and A^l from the a- into the A-population occurs 
and both neutral alleles {i, 1) and {i, 2 ) originate from the same A-individual, 
first (backwards in time) (i, 2) recombines into the A-population, then {i, 1) 
recombines into the A-population and connects to a different individual than (*, 2). 
first (backwards in time) the tuple {(i, 1), {i, 2)} recombines into the A-population, 
then a second recombination splits the two neutral loci inside the A-population. 


i?l|2(*)(i’9“) : [2, 1]Z U [12, 21:4“ (see FignreSD 


Finally, we introduce a conditional probability for the process JV: 

(4.3) P«(.) = P(.|j'^(l)<oo), 











GENEALOGIES OF TWO NEUTRAL LOCI AFTER A SELECTIVE SWEEP 


11 



Figure 4. Illustration of events [2, (individual 1) and [12, 2]^^^ (individual 2) 

where J'^(l) has been defined in (I4.2I1 . Hence, recalling the definition of (gi, 92 , 93 ) in (11.1211 we will prove 
in Section (6] 

Proposition 1 (Neutral genealogies during the first phase). Let i he an a-individual sampled uniformly 
at the end of the first phase (time ). Under Assumption[Il there exist two finite constants c and Eq 
such that for every £ < £ 0 , 


liinsup||p(i)( 7 Vi?(t)(i)) - 9192 + P^^^(i?2(i)(i)) - 91(1 - 92) 

K—yoo ^ ' 

+ P(i)(i?12(t)(i)) - 93 + - (1 - 91 - 93) } < c£. 


For large K, the sum of the four probabilities of Proposition |T] equals one up to a constant times e. 
Hence, in the limit we only observe the events described on page [TOl The probabilities of the first two 
events are quite intuitive: broadly speaking, the probability to have no recombination at a birth event is 
1 — ri — £ 2 , the birth rate is fa and the duration of the first phase is log K/SaA- Hence under P(^\ the 
probability of the event NR{i)^^'> is approximately 

(1 - (n +£ 2 ))^“'°®^/''“^ ^ exp(-(ri = 9192 - 


Similarly the probability to have no recombination between SL and N^l is close to 91 and subtracting the 
probability of NR{i)^^^ we get this of R2(iY^\ The probabilities of i?12(i)(^^ and i?l|2(i)(^>®“) are more 
involved. The proofs rely on a fine study of the different possible scenarios. 


Second phase: We work with the process N to study the second phase. The latter one has a duration 
of order 1, and the recombination probabilities are negligible with respect to one (Condition (11.11) 1. 
Consequently, no event impacting the genealogies of the neutral loci occurs during the second phase. 
More precisely, let us sample uniformly two distinct a-individuals i and j at the end of the second phase 
(time + t^) and introduce the events: 

NR{i)^'^'^ : there is no recombination affecting (i, 1) or (i, 2), 

NC{i,j)^^'^ : there is no coalescence between the neutral genealogies of i and j. 

Then we have the following result, which will be proven in Section jT] 

Proposition 2 (Neutral genealogies during the second phase). Let i and j be two distinct a-individuals 
sampled uniformly at the end of the second phase (time + tg). Then under Assumption[ll 

lim P(Afi?(i)(2) n NC{i,j)^^'>\T^ < Sf) = 1. 

K^oo 

Third phase: Finally, we focus on the process N. When K is large, there is only one event occurring 
with positive probability during the third phase which may modify the ancestry of the neutral alleles of 
an individual i sampled at the end of the sweep in the adjacent geometry: 


(4.4) 


i?2(z)^^’®“^ : a recombination between loci A^l and N2 occurs and separates 
{i, 1) and (i, 2) within the a-population. 
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Indeed, if we also define the events 

: there is no recombination affecting (i, 1) or {i, 2) and they both 
originate from the same a-individual at the end of the second phase 
: defined as for two distinct individuals sampled 

uniformly at the end of the sweep, 


and the conditional probability for the process N:: 

(4.5) := P(.|Arf 

where and are the analogs of and (defined in (13.111) 1 for the process N, then 

we will prove in Section [7J 


Proposition 3 (Neutral genealogies during the third phase). Let i and j be two distinct a-individuals 
sampled uniformly at the end of the sweep. Under Assumptions^ there exist two finite constants c and 
So such that for every e < Eq, 


lim sup 

K—¥oo 


p(3)(i?2(i)(3-9“))-(l-g2) 


p(3)(Afi?(i)(3))-g2 


+ 


p( 3 )(iVC'(i,j)^ 3 ^)- l|} < c\/i. 


In particular, there is no recombination with the A-population during the third phase. As for the 
probabilities of the first two events in the Proposition [T] this result is quite intuitive, as the duration of 
the third phase is close to log AT/15^01- 


Independence: Finally we again consider the population process N and state a proposition which 
enables us to give the statement of Theorem [T] independently for all sampled individuals, that is, jointly 
for the whole sample. To this aim, let us introduce a partition G which is the analog of 

where the d individuals are sampled at the end of the first phase and not at the end of the sweep. Recall 
Definitions O and [Ql and denote by (resp. the number of a-individuals in a d- 

sample taken at the end of the sweep whose neutral alleles originate from two distinct a-individuals (resp. 
from the same a-individual) at the beginning of the third phase. Then we have the following result: 

Proposition 4. Let Assumption [I] hold. Then there exist two finite constants c and Sq such that for 
every e < Eq, the ancestral relationships of a d-sample taken at the end of the first phase (time ) 
satisfy for every [rrik , 1 < fc < 4) G ; 


lim sup 

K—¥oo 


P(|0f’') 


-1 


k = rrik, 1 < fc < 4 |T^^ < S^) 
d\ 


{mi+m2+m3+m4=d} 




{qiq2r^{qi{l - q2)r^qr{^ ” 9i ” q^Y 


< CE. 


In the same way, the neutral genealogy of a d-sample taken at the end of the sweep satisfies for every 

(mfc, 1 < fc < 2) G Z^; 

d\ 


lim sup 

K—^oo 


P((|R2(3.9“)|,, |iVR(3)|,) = (™,,^2)|AAf) - - <12)^^ q^ 


TOi!m2! 


< CE. 


The Proposition|4]is a key result: we only need to focus on individual neutral genealogies to get general 
results on the genealogy of a d-sample with respect to the neutral loci. It will be proven in Section [5] 


4.2. Proof of Theorem [TJ Let i be an individual sampled uniformly at the end of the sweep. The idea 
of the proof is the following: in a first step, we list certain compositions of coalescent and recombination 
events leading to specific ancestral relationships which could be described by blocks of a partition of A^. 
Then we approximate the probabilities of the described events and prove that these probabilities sum 
to one up to a constant times ^/e for some fixed small e. This shows that in the limit for large K the 
neutral genealogy of the individual i belongs to those described on page [TUI with a probability close to 
one. In a second step we use Proposition S] to treat the neutral genealogies of the d sampled individuals 
independently. 
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i) We consider two possible trajectories such that the alleles at both neutral loci originate from the 
mutant: either the two neutral loci separate inside the a-population during the third phase and 
coalesce during the hrst phase, or they stay in the a-population and do not separate during the 
whole sweep (see individual 1 in Figure [T|): 

n n NR{i2)^^'> n NC{il, i2)(2) n U n NR{i2)^'^'>'^ 

y (^NR{i)^^^ n 7Vi?(z)(2) n NRii)^^^'^ , 

where U is the disjoint union and we denote by il and i2 the labels of the parents of the first 
and second neutral loci of i, respectively, at the end of the second phase (the way we label the 
a-individuals has no importance as they are exchangeable). 

ii) We consider two possible trajectories such that (i, 1) originates from the mutant and (i, 2) origi¬ 
nates from some ^-individual 

(i?2(i)(3.9“) n 7Vi?(a)(2) n NR{i2)^^'> n NC{ii, i2)^^'> n [iVi?(a)(i) u R2{ii)^'^'>] 

n [i? 12 (i 2 )(i) U i?l|2(i2)(i) U i?2(i2)(i)]) |J (^NR{i)^^^ n NR{i)^'^'> n R2{i)^'^^^ . 

The first bracket considers a separation of the two neutral loci during the third phase. As a 
consequence, the fate of the hrst neutral locus of individual i2 during the hrst phase has no 
consequence on the neutral genealogy of i. This is why we consider the event {i?12(i2)^^) U 
i?l|2(z2)(^^ U i?2(i2)(^)} and not only {i?2(z2)(^^}. The second bracket corresponds to individual 
2 in Figured] 

iii) We consider one possible trajectory such that (i, 1) originates from some A-individual and {i, 2) 
originates from the mutant (see individual 5 in Figured]) 

R2{i)(^’3a) PI 7VA(a)(2) n Afi?(i2)(2) n NC{il, i2)^^'> n [i?12(a)(^) U i?l|2(a)(i’9y n NR{i2)<^^^ 

iv) We consider one possible trajectory such that (z, 1) and (i, 2) originate from the same A-individual 
(see individual 3 in Figured} 

NR{i)^^'> n lVi?(z)(2) n i? 12 (i)(i) 

v) Finally, we consider two possible trajectories such that (z, 1) and (z, 2) originate from distinct 
A-individuals (see individual 4 in Figured) for the second bracket): 

(a2(z)(3.9“) n AfA(zl)(2) n NR{i2)^^^ n NC{il, i2)<~^'> n [i?i2(a)(i) U i?l|2(A)(i)] 

n [i?12(i2)(^) U i?l|2(i2)(i) U A2(i2)(i)]) |J (iVi?(i)(3) n NR{i)^^^ n i?l|2(z)(i’9“)) . 


Thanks to (13.31) . and (13.131) to (13.151) we know that for all non negligible measurable events 
and (17(3) occurring during the hrst, second and third phase respectively, 

(4.6) P(C'(i),C'(2),C'(3),Fix-^) = P(C'(^\C'(2),C'(3),Arf a + OKie) 

where Ok{£) is a function of K and e satisfying 

(4.7) limsup |OK(e)| < ce, 

K—^oo 

for e < Sq where Bq and c are hnite. Using the same inequalities we can decompose the right hand side 
of (14.611 as follows 

P(C'(i), {Tf < oo}) + P(C(2), {Mll+hl g [u;,,uj 2 ], -na\< {Tf < Sf}) 

+ P(C'(3\ A ) -HOif(e). 

Then from (13.141) we get 


P(C'(i), {Tf < oo}) = p(i)(C'(i))P(Tf < oo) -k OKie), 
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from dSH) 

P(^( 2 )^{ A^(t|+i,) ^ \E^(ll±hl-fia\ < £}|CW,{Tf < ^f}) = P(C(2)|cW,{rf < S^})+OK{e), 

and from (13.131) and (13.151) 

A ) = p(3)(c(3)|c(i\C'(2)) + OA(e), 

where (resp. corresponds to the event (resp. expressed in terms of the process N 

(resp. N). Putting everything together we finally obtain 

(4.8) P(C'(i),C'(2),C'(3)|Fix^) = P(3)((7(3)|c'(i)^c'(2))p(c'(2)|c'(i) jjUf < 5'^})p(i)((7(i)) + o^(e). 

By applying Propositions [U[21 Eland 0] we then can calculate the probabilities corresponding to the five 
sets from Definition II.31 and get the probabilities {pk, 1 < fc < 5) defined in (I1.13p . which sum to one. Let 
us detail the calculations for the case i): by applying (14.81) . Proposition 01 and the Markov property, the 
probability to see one of the two trajectories described in j) is 

(4.9) r{i, 1) = p(3)(i?2(i)(3.s“))p(i)([Ari?(j)(i) u i?2(i)(i)] n 

+ p(3)(Ari?(i)(3))p(i)(iVE(i)(b) + Ok{s), 

where i and j are two distinct individuals (exchangeability). But thanks to Proposition 01 we know that 
the neutral genealogies of individuals i and j are nearly independent. Hence adding Proposition [1] leads 
to 

U R 2 {i)^^^] n NR{j)^^^) = {qiq2 +qi{l- q2))qiq2 + Oa(£). 

Applying Propositions [H and 01 in (14.91) yields 

Vii, 1) = (1 - q2)qlq2 + 92<?ig2 + OxiVs) = Pi + Ok{V£), 
where we recall the definition of pi in (11.131) . 

Finally, we get the asymptotic independence of the neutral genealogies of the d sampled individuals 
during the first and third phases by applying the multinomial version of the de Finetti Representation 
Theorem (see [5] Chapter 4 for a simple proof) to the result of Proposition 01 The asymptotic indepen¬ 
dence during the second phase follows from Proposition |2l as, with high probability, nothing happens. 


5. Number of births and deaths during the selective sweep 

In this section we derive some results on birth and death numbers of the population processes N and 
N, needed in Sections 0] and [7] to prove Propositions (HEl and 01 


5.1. Coupling with supercritical birth and death processes during the first phase. We are 

interested in the dynamics of the process Na during the first phase, that is, before the time . The idea 
is to couple this process with two supercritical birth and death processes, and deduce its dynamics from 
well known results on birth and death processes. Recall the definition of the rescaled invasion fitness s 
in (11.101) . and for e < SaAj^Ca,ACA,alC a,a + Ca,a) define the two approximations. 


(5.1) 


s — 


'^Ca,ACA,a + Ca,aCA,A 
faCA,A 


e =: S-{£) < s < s+(e) := s ■ 


2 Ca,ACA,a 
faCA,A 


Then for t < A the death rate of a-individuals in the process N equals that of the process N, 
defined in ( 021 ) and satisfies 


(5.2) 1 - S+(£) < = 1 - S + ^{NaH) - uaK) + ^Nait) < 1 - 5-(£)- 

faNa{t) laR JaR 

For Sf <t< , according to the definition of N in (13.171) . the death rate of a-individuals also satisfies 


1 - S-h(£) < 


d^jNAjS^ )eA^,iV<°H0) 


< 1 — s-(e). 


(5.3) 


faNa{t) 






















GENEALOGIES OF TWO NEUTRAL LOCI AFTER A SELECTIVE SWEEP 


15 


Hence, following Theorem 2 in [3] we can construct the processes , [Nat^o) and on the same 
probability space such that almost surely: 

(5.4) Z7(t) < 7Va(t) < Z+(t), foralH<ff, 

where for * £ { —, +}, Z* is a birth and death process with initial state 1, and individual birth and death 
rates /„ and /^(l - s*(e)). 


Let denote the time of the first hitting of \ u\ by the process Na- 

(5.5) ■.= mi{t>Q,Na{t) = \u\}, u£R+. 

If for — 1 < s < 1, is a random walk with jumps ±1 where up-jumps occur with probability 1/(2 — s) 
and down-jumps with probability (1 — s)/(2 — s), we introduce 

(5.6) ^ := C (0) = , * £ N. 

the law of when the initial state is i £ N and for every p £ 1R+ the stopping time 

(5.7) Tp := inf {n £ Z+, Z/®) = [pj }. 

5.2. Number of jumps of Na during the first phase. 


5.2.1. Expectation of the number of upcrossings. Let us recall Equation 63 and consider k < [eiLj. 
Then the number of upcrossings from k to k + 1 during the first phase is: 

(5.8) <ff,iNair^^),Nair^+,)) = ik,k + l)}, 
where (1) stands for the first phase. Recall (13.11) and (15.IL and introduce a real number 

(5.9) A,:=(1-s_(£))3(1-s+(£))- 2, 
which belongs to (0,1) for e small enough. We have the following result: 


Lemma 5.1. There exist three positive finite constants c, Kq and Eq such that for K > Kq and e < Eq: 
If j ^ k < IeK\ and ua £ I^ ± 1, 


(5.10) 




< CE. 


If k < j < \eK\ and ua £ I^ ± 1, 

If k' <k < \_eK\ and ua & if ± 1, 


(5.12) 




Proof. The idea, which comes from m and will be used several times throughout SectionjSl is to compare 
the number of upcrossings with geometric random variables. Suppose first that j < k. Then on the event 
{Tf < oo} the process Na necessarily jumps from fc to fc -I- 1. Being in fc -|- 1, it either reaches [eK] 
before k, or it goes back and then again from fc to fc -|- 1 and so on. We first approximate the probability 
that there is only one jump from fc to fc -|- 1. As we do not know the value of Na when Na hits k for the 
first time, we bound the probability using the extreme values it can take. Recall Definitions (15.511 and 
(E3. The upper bound is derived as follows: 

(5.13) pW _^.)(C/f(l) = l) < 


where we use (ig and for (si,S 2 ) £ (0,1)^ 

-nfsi) 


sup < erf) 

UA^Ig ±1 

P{nA,k+l){Tf <Crf)^ {s+{e),s-{e)) 

sup -— r<% 1 

P(uA,k+i){Tf < 00) 
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Similarly, we show that j) = 1) ^ gg^j^g way, we can approximate the 

probability that there are least three jumps from k io k -\- 1 knowing that there are at least two jumps, 
and so on. We deduce that we can construct two geometric random variables Gi and G 2 , possibly on an 
enlarged space, with respective parameters /\ 1 and g^ch that 

(5.15) Gi < [/f (1) < C? 2 , a.s. 

In particular, taking the expectation we get from dEH) 

(5.16) 


s+(£)(l - (1 - s-(e))L'^'^J) 


< 


(1 - (1 - s-(g))L^^J-fe)(l - (1 - s+{e)f+^) 

s-(e)(l - (1 - s+(£))L®-^J) 


According to (11.101) and dSHI), 0 < s < 1 and |s+(e) - s_(e)| < (dCa.^C'A.a + Ca,aGA,A)£/{faCA,A)- 
Hence the last inequality and straightforward calculations lead to (15.101) . 

Let us now assume that k < j. Then we have 


riA^Is :tl 


P(nAj)(L^ < 00) 


^ Vk*^^^\TeK < To)Pj" < Tex) ^ (1 - S-(£))l~^ 

<To) ~ s+(e)s_(e) 

where we again used (ig and dEIJ. Moreover, the same proof as for (15.151) leads to: 

> 1] < < s-_\e), 

where we used Equation (IB. 31) . This ends the proof of (I5.11|) . The last inequality, (15.121) . has been stated 
in [in] (Equation (7.26)). □ 

5.2.2. Expectation of hitting numbers. Let us recall (15.81) and introduce for 0 < j < k < [£K\ the total 
number of downcrossings from fc to /c — 1, 

(5.17) D^il):=#{m,T^<ff,iNair^),Na{T^^,)) = {k,k-l)}, 
and the number of hittings of the state k by the process Na before the time T^: 

(5.18) Ef (1) := C/f_i(l) + i5f+i(l) = #{m,T^ < ff ,7 V„(t^_i) + k^N^^r^) = k}. 

Recall the definition of Ag G (0,1) in (15.91) . We can state the following Lemma, which will be useful to 
get bounds on the number of upcrossings of the A-population during the first phase (see Lemma l5.4l) : 

Lemma 5.2. There exist three finite constants c, Kq and eg such that for K > Kg, e < £g and k' < k < 
[eK\: 


E(i)[E/'(l)]- 


(2-s)(l-(l-s) 


- - 


{l-sf) 


<ce, and \Cov^^\V^{l),V^{l))\ < c{e+Xi^ 


Proof. Under the a-population size goes from 1 to [eTLj, thus the number of downcrossings from 
fc + 1 to /c is equal to the number of upcrossings from fc to fc + 1 minus 1. Adding (15.181) yields 

Vk^{l) = Uf_i(l) + Uf (1) - 1, P« - a.s. 

We get the first part of the Lemma by taking the expectation and applying (I5.1()|) . The proof of the 
second part follows that of (I5.12L and once again we can find the details in the proof of Equation (7.26) 

in nni. □ 
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5.2.3. Number of upcrossings during an excursion above or below a given level. We now focus on the 
number of upcrossings from k to k + 1 during an excursion above or below 1. Let us denote by cr/^(l) the 
jump number of the first hitting of I before the end of the first phase: for I < [eJfJ, 

(5.19) erf (1) := inf{m,Tf < f^,Na{T^) = 0, 

and for 1 < /c, Z < \eK\ and ua € ± 1, 

(5.20) C/f,.i.fc(l) := #{m < af (1), (iV,(Tf), fV^rf+f) = (fc, fc + 1)}. 

Then, if we denote by the real number 

(5.21) M,:=(l-s_(e))fl-s+(e))-f 

which belongs to (0,1) for e small enough, we can derive the following bounds: 

Lemma 5.3. There exist three positive, finite constants c, Kq and Sq such that for K > Kq, £ < Eq, 
1 < k < I < [eit'J and ua G if ± 1, 

< oo] V ^ 

Proof. Equations (B.5) and (B.6) in [19] state that for fc < / < [eit'J and ua & if ^ 1, 
^[uA,k+lf^nA,k,lf) > l|crf (1) < OO) < c(l - S_(e))'“'=, 

and 

' 1 — s+(e) \ 


P 


( 1 ) 

(uA.k + l) 


> l,af (1) < oo) > 


for a finite c. By comparing Uf^ fci(l) with a geometric random variable we get the first inequality. 
To bound the expectation of upcrossings from fc to fc + 1 during an excursion below I we first bound 
the probability to have at least one jump from fc to fc + 1 during such an excursion. By definition, Na 
necessarily hits I — 1 during the excursion below 1. Recall Definitions (14.311 . (15.51) and (15.7L Then for 
every nA in if ± 1, 


P 


<<^fff <^) = 


< Ookfc < ^f)'^(nA,l-l)ff < <^f) 

P{nA,l-l)(J'f < oo) 


^ < TpjT’f < n) ^ (1 - S-(g))^~^~^ 


< T-o) 


S-(f 


where we used dEH). The next step consists in bounding the number of upcrossings from k to k -\- 1 
during the excursion knowing that this number is larger than one: for ua G if ± 1, 


P 


( 1 ) 

{nA,k+l) 




'I'{uA,k+i){Tf < oo\af < gf )P(n.4,fc+l)(g'f < ^k) 


^(nA,k+l){Tf < oo) 


> 


V, 


(«-(e))/ 


I {tsK < ro)Pi+/^^Vz < T-fe) 2 / ^ 

— )1 


'Pifl^'’\reK < To) 


where we again used (IB.IP Hence on the event {Uf^ i j^{l) > 1}, Uf^ i kf) smaller than a geometric 
random variable with parameter s?_(£:) and we get: 


which ends the proof of Lemma 15.31 


(l-s(£)) 


l-k-l 


s_(e) 


□ 
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5.3. Number of jumps Na during the first phase. We introduce for k < \eK\ the number of 
upcrossings of the ^-population when the a-population is of size k\ 

(5.22) (1) := = k}. 

We are now able to get bounds for the expectations and covariances of these quantities: 

Lemma 5.4. There exist three finite constants c, Kq and Sq such that for K > Kq, e < Sq and k < [eK\, 

k « _ 1 7 k 




fAnAK log k 


2=1 


Sfa 


<cK{l + e\ogk) and Var^^^ < cK^ {1 + e \og^ k). 




Proof. The proof is based on the comparison of the A- and a-population jump rates. Let us first focus 
on the a-population. For k < [eK] and ua £ ± 1, 

^ = fc± l)P(„^,fc)(iV„(dt) = fc± 1)/P(„^,fc)(ff < OO) 

< < oo)P(„,,fe)(7VJ<5t) = fc± < oo) 


< 


(1 -I- ce) 


■ ((1 - (1 - sf+^)fak + (1 - s)(l - (1 - sf-^){Da + CaAUA 


6t 


1 - (1 - 5)^= 

(5.23) = (1 + ce)/a(2 - s)kSt, 

for a finite constant c and e small enough, where St is a small time step and by abuse of notation we 
did not indicate the o((5t)’s. We used the definition of P*-^^ in (1431) for the equality, Coupling (lOTl for 
the first inequality, (IB. II) for the second one, and the equality SaA = fa — Da — Ca,AnA for the last one. 
Reasoning similarly we get: 


(5.24) 


(1 - ce)U2 - s)k6t < P[Z,k)iNam ^ k) 


Let us now focus on the number of upcrossings of the A-population. The definition of N in (13.171) and 
Bayes’ Theorem yield 

(5.25) (1 - ce)fAnAKSt < p[;^|^_^^(iV^((5t) = ua + 1) < (1 + ce)fAnAKSt, 

for a finite c and e small enough. Indeed, from Coupling (15.41) and Equation (jB.lIl we get the following 
bound, independent of ua in ± 1: 

1- (l-S-(g))^ (fK . 1- (l-S+(£))fe 

l-(l-s_(e))Ls^J - e l_(l_ 5 ^(^))LEAJ• 

Hence there exist two finite constants c and eo such that for every £ < £o, if we introduce the parameters 

1 ^ , fAUAK ^ , fAflAK 1 

:=! + (!- ce)— -r-^ < 1 -I- (1 -|- c£) 


(5.26) 


{e) (2 s)fak 

we can deduce from (15.231) to (15.251) that for k < [eK\ 

(5.27) A^7jK 


(2 s)fak q^^\e) 




v,«(l) 




- 1 


where for j £ {1, 2}, (G\,) , * G N) is a sequence of geometric random variables with parameter q^'^ (e) 

independent of V)^(l) (defined in (15.181) 1 for all I < [sK\ . Hence a direct application of Lemmas l5.2l and 
IB.21 leads to 

(5.28) 


1E;(i) 




fAnAK 

sfak 


(l-(l-s)'=-(l-s)L"^J-'= 


K 


for a finite c and e small enough. This implies the first inequality of Lemma 15.41 

Let us now bound the second moment of Z^^(l) and the expectation of {1)U^{!) for fe ^ L The 
first upper bound follows again from a direct application of Lemmas 15.21 and IB. 21 We get 


(5.29) EW[(Wf(l))2] <eW[( ^ 


< 


2(EW[H,^(1)])2 


ffAnAK\^ 


W(i) 


(<zr(£))^ 
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for a finite c and e small enough. A new application of the same Lemmas yields, for k < I < \ £K\ 


(5.30) 


1E(i) 








[faSYkl 


where we used that + Cov*'^^(A, F) for any real random variables {X,Y). 

From (|5.27p to E5.30|l and (IB.21) we deduce that there exists a finite c such that for e small enough and 
k < [eATj, 


(5.31) 


EW[(^Wf(l))'] <(l + c£)( 


fAnAK\ogk\'^ 


faS 


) +cK\ 


Reasoning similarly to get the lower bound, we obtain 

k 

(5.32) 


K 


i=l 


f An aK log k^"^ 


faS 


^ < ciF^(l + elog^ A:). 


Adding the first inequality of Lemma 15.41 we conclude the proof. 


□ 


5.4. Coupling with subcritical birth and death processes during the third phase. We couple 

the process Na with two subcritical birth and death processes to control its dynamics. We recall the 
definition of Aff in (I3.12|) and introduce 

(5.33) s:=\SAa\/fA. 


Let us dehne for e small enough, 

M"CA,a 


(5.34) 


s — 


fA 


e =: s-{e) < s < s+(e) := s + 


Ca,a + M"Ca,, 

Ta 


-e, 


where M" has been defined just before Definition (13.101) . Then, according to the definition of N in (|3.18p . 

we can follow Theorem 2 in [3] and construct the processes , N and Yf on the same probability space 
such that on the event 

(5.35) Y+{t) < NAit) < Y-(t), for all +t^<t< Tf + te + a.s., 

where for * G Ye i® ^ birth and death process with initial state Na[T^ + te) and individual 

birth and death rates /a and /a( 1 + s*(e)), and we recall that is the analog of (defined in 

(|3.111) 1 for the process N. 


Recall Definition (15.6L and let us introduce for i G N and for p G R+ the stopping time 

(5.36) Vp ■= inf{u G Z+, = [pj}. 


5.5. Number of jumps of Na during the third phase. Similarly as in (I5.18P we introduce for 
1 < fc < [eATj the random variable V^(3) which corresponds to the number of hittings of state k by the 

process Na during the third phase. Recall Definitions (13.91) . (I3.10|) and (I5.34E We have the following 
approximations: 


Lemma 5.5. Let u he in [a;i,a; 2 ]. There exist three finite constants c, Kq and eg such that for K > Kg, 
e < eg and na in ± 1, if \ uK\ < k < [eK \, 


E 


(3) 


[Vf(3)] < (l + c£)^(l + s-_(e))L“^J-^ 


and if k < [uATj, 


(3) 

(LuRj.Ila) 




-(l-(l + s)-'=-(l + s)'=-L"^J) 


< ce. 
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Proof. The proof is very similar to that of (I5.10I1 . hence we do not detail all the calculations and refer to 
the proof of Lemma lOI First we consider \ uK\ < k < \eK\ and approximate under the probability 
for Na to hit k before the extinction of the A-population. Indeed, if fc < [uK\, we know that Na hits 
k P^^ha.s. Let luK\ < k < [eLfJ. Then for every Ua G ± 1, Equation (IB.1|) implies 


(5.37) 


p(3) 

^{\uK\,n^) 


{Na hits k) < 


< Vck) 


< 


ce 


(1 + ’ 


for a finite c, e small enough and K large enough. The second step consists in counting how many times 

the process Na hits k during the third phase knowing that it happens at least once. Once again we will 
compare this number with geometric random variables, by approximating the probability to have only 
one jump. The following inequality follows the spirit of (15.131) . The only difference is that in the third 

phase Na is coupled with subcritical birth and death processes, whereas in the first phase Na was coupled 
with supercritical birth and death processes. For every ria G ± 1 and k < \eK\, 


P(feL)(^A(t)<fc,Vt>0)> 




(uo < Vk)Q^k ^‘''^\vk-i < Ufc+i) 


< VeK) 


> 


(1 — ce)s 


(2 + s)(l - (1 + s)-'= - (1 +s)'=-h-^J)’ 


We derive the upper bound similarly and end the proof by comparing the hitting numbers with geometric 
random variables. For \uK\ < k < [eK\ we have to multiply the expectation of the geometric random 
variables by the probability to hit k at least once, approximated in (15.3711 . □ 

5.6. Number of births of a-individuals during the third phase. Recall (13.161) and let U^{3) be 
the number of births in the a-population during the third phase when Na equals k < [eK\ 


(5.38) l/f (3) := #{m, Tf + t, < ^a{t^) = K and {{k{T^+i) - k{T^) = 1} 

or {Na{T^+i) = Na{T^)k^‘'\k+i) 7^ 

We now state an approximation for the expectation of U^{i). We do not prove this result as it is obtained 
in the same way as Lemma l5.4l the birth rate of the a-population is close to /ad-a/G, the jump rate of the 
^-population is of order (2 -|- s)fAk when Na = k and the expectations of the hitting numbers for the 
^-population are given in Lemma 15.51 The only difference is that the ^-population size can hit values 
bigger than the initial value of the third phase, Na{T^ + tg). However the probabilities to hit such values 
decrease geometrically (see Lemma [53]) and they have a negligible influence on the hnal result. Thus we 
get 


Lemma 5.6. There exist three finite constants c, eg and Kg 

/c 

1 t rt K Incr tc 

<cK{\+£\ogk) and 




faUaK log k 


sfA 


such that for e < £g, K > Kg and k < [e/Fj 
k 

Var(3) (3)) < cK^ {1 + e log^ K). 

i=l 


6. First phase 

This section is dedicated to the proof of Proposition |T| We prove that there are only four different 
possible ancestral relationships of the two neutral loci and calculate the probabilities for the non-negligible 
possibilities. 

6.1. Coalescence and recombination probabilities, negligible events. Recall Definition 14.11 and 
define, for j G {1, 2} 

r* := ri -|- l{j= 2 }(i ’2 - 2 rir 2 ), and 

which denote the probability to have one (resp. two) recombination(s) somewhere before the locus Nj 
(resp. before the locus N2) at a birth event. 

Definition 6.1. For {a, a') G A^, j G {1,2} and n = {nA,na) G N-^ we define: 

Paa'\''^) probability that two randomly chosen neutral alleles, located at locus Nj and associated respec¬ 
tively with alleles a and a' at time , coalesce at this time conditionally on {Na, Na){Tf^_i) = n 
and on the birth of an individual carrying allele a at time 
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P 


( 1 . 2 ) 

aa' 


(n) 


;= probability to have one (and only one) recombination from the a- into the a'-population before 
locus Nj conditionally on {Na, Na){T(^_i) = n and on the birth of an individual carrying allele 
OL at time . 

;= probability to have a double recombination under the same conditions 


Then we have the following result: 


Lemma 6.1. Let a & A, n = {nA,na) & such that na < i 1 and j G {1,2}. Then 

there exists a finite c such that, 


pic,j) 

r'aa 


(n) = 


1 - 


r* fAnA 




a(na + l)V fAnA + fana' 


Ua 


and PAa\n)< 


{Ua + l){fAnA + fana) 




Proof. The proof of the two equalities can be found in |19| (Lemma 7.1) as the expression is the same 
for UA G or dist{nA, I^) = 1 (where dist is the canonical distance on R). The only difference is that 
we consider two neutral loci and have to exclude the double recombination case. Indeed, if there are 
simultaneous recombinations the alleles located at SL and N2 in the newborn originate from the same 
parent. The expressions of P^Aa {n) in the case where ua £ are also stated in [T^] (Lemma 7.1), and 
from the definition of N in (13.171) we get that when dist{nA, if) = 1, P^AA{n) = and PAa\n) = 0. 
This ends the proof. □ 


Next we focus on the recombination probabilities: 

Lemma 6.2. Let a G A, n = {ua, nf G such that Ua < \aK\ , £ if i 1 and j G {1, 2, (1, 2)}. 

Then there exist two finite constants c and eg such that for every e < 

pU) (p) = U).. ^ ffAriA 

““ {Ua + 1)(/a«A + faTla) ’ (Uo + 1 )(/a11A + faTla) ’ 


(6.1) PaI(ii) < ^ (1 -ce)—< PAAi^A, k)<—, k< [eK\ 

K log K riA R-A 


Proof. The second equality is stated in m Equation (7.2). 

Conditionally on the birth of an a-individual and the state of the process at the (m — l)-th jump, the 
probability of picking the newborn when choosing an individual at random amongst the a-individuals is 
equal to l/(na + 1)- A recombination before the locus Nj (or before locus N1 and locus N2 if j = (1, 2)) 
happens with probability r*, independent of all other events. Finally, the probability that the second 
parent is an a-individual but is different from the first parent is equal to fa(na — l)/(/yin^ + fanf- This 
proves the first equality. 

When UA G if we get similarly that 


Paa(^) 


r*fA{nA - 1 ) 

{ua + 1)(/a?^A + fana) 


and PAiin) 


r* fana 

(ua + 1)(/a?^A + fana) ’ 


and from the definition of N in (13.171) we obtain that when dist{nA,lf) = 1, Paa^''^) — '^ 2 {nA — rj/n\ 
and P^Aaf^) — Condition (11.111 completes the proof. □ 


Remark 1. Let us recall the definition of if in dm. Then there exist three finite constants c, eo and 
Kq such that for e < Eq, K > Kq, j G {1, 2, (1, 2)}, ua G if ± 1 and k < (eK), 


( 6 . 2 ) 


(1 - ce) 


fc + 1 


<P^aAinA,k) < 


fc + 1 


pf){nA,k)< 


fa 1'2 

fA nA ~ K log K ■ 


< 


Recalling the definitions of the mth jump time and the number of jumps in (14.11) and (14.211 . we define 
for J G {1, 2, (1, 2)1, m £ N and an individual i uniformly picked at the end of the first phase. 


(6.3) {aij)m ■= {na < J^(l) and the j-th locus/loci of the i-th individual is/are associated 

to an allele a at the m-th jump time}. 

The notation (ajljmj {cJiT)ra here implies that the two neutral loci of individual i are associated to two 
distinct individuals at the mth jump time, for any a, of £ A. 
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To approximate the genealogy of the neutral alleles sampled at the end of the first phase we will focus 
on the recombinations and coalescences which may happen during this time interval. Keep in mind that 
when looking at coalescing neutral loci, the parent’s type may differ from the type of up to one child. 
We first prove that we can neglect some event combinations. Sample 2d distinct individuals uniformly at 
the end of the first phase (maximal number of ancestors for the 2d neutral alleles sampled at the end of 
the sweep) and define: 

aAa: a neutral allele recombines from the a-population to the A-population, and then (backwards in 
time) back into the a-population 

CR\ two neutral alleles coalesce in the a-population, and then (backwards in time) recombine into the 
A-population 

CA: two neutral alleles coalesce and at least one of them carries the allele A at the time of coalescence 

2R: a neutral allele takes part in a double recombination (i.e. a recombination before iVl and a 
recombination before N2 at the same birth event) 

R2a: a recombination separates the two neutral loci of an individual within the a-population 
We can bound the probability of these events as follows: 


Lemma 6.3. There exist three positive finite constants c, Kq and Eq such that for £ < Eq and K > Kq 

p(i)(aAa)-bP(i)(Ci?)-bP(i)(2i?)-bP(^Hi?2a) < —^, and p(^)(CA) < 

log K K 

Proof. The probabilities of events aAa, CR and CA are bounded in m Lemma 7.3 and Equation (7.19) 
for the process N. But according to Lemmas 16.II and 16.21 the coalescence and recombination probabilities 
for the process N are very close or even smaller when distfnA, I^) = 1 than when N and N are equal. 
Hence we just have to bound the probability of 2R and R2a. If a neutral allele experiences a double 
recombination, it happens either when it is associated with an allele a, or with an allele A. From Lemma 
16.21 and the fact that ri and r 2 are of order 1/ logiL we get for k < [£K \: 


sup 


+PaA^)inA,k) 


< 


and sup 


[PAa 


-Paa 


' A ){ nA , k ) 


< 


K log^ K 


Recall the definitions of 17^(1) and {!) in (15.811 and (I5.22|) respectively. As a birth of an a-individual 


{k + 1) log K nAEiP±l 

ffa)andZ7f( 

is needed to have a recombination from the a- to the a'-population, we can bound the probability to 
have a double recombination by: 


p(i)( 2 i?)<^_EW[^ V + 

^ ’ ~ loff2 K I ^ \ k + 1 K J 


log K 


k=l 


By applying inequality (15.1011 and Lemma [5^ we succeed in bounding P(^)(2i?) by a constant over log A". 
It remains to consider the event R2a of a recombination within the a-population. Define the first time 
(with respect to the backwards in time process) that this event happens: 

R^]{i) := sup{m, m < J^(l) and both neutral loci of the i-th individual are 
associated to distinct a-individuals at the (m — l)th jump, 

where {i) = —oo if the event does not happen during the first phase of the sweep. Then, 


(6.4) 


VeK\-l 


> 0) = E 


1=1 


< 


VeK\-l 

= E E <-^^(l)i^a(T^_i) = ^^o(t^) = / + 1, (a*l)m, (ai2)m,Vm'> m : (ail2)m0 

I—I m<oo 

LeAJ-l 

E E (7'ia(»^A,/)P(^^^_i+i)(Vm > 0 : (aH2)™))p(i)(m < J^(l),iVa(T,^_i) =/,iVa(r^) 


m<oc 


LeAJ-l 


^ E 




KlogK 




log AT’ 


l+l) 
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by (|5.10|) and (I6.2|) . □ 

To simplify the notations we will denote the union of all negligible events by 

(6.5) NE ■= aAaUCR\JCA\J2R\JR2a. 


6.2. The two loci of one individual separate within the T-population. Having excluded events 
of small probability, there are exactly two ways for the neutral alleles of an individual sampled at the 
end of the first phase to originate from two distinct H-individuals. The two possibilities were already 
described on page [TU] and represented in Figure 01 The ideas which are pursued in this section are similar 
to the ones from [2], but there are extra difficulties due to the randomness of the population size. 


6.2.1. Event [2, 1](4^. The aim of this section is to prove the following approximation: 


Proposition 5. Let i be an a-individual sampled uniformly at the end of the first phase. There exist two 
finite constants c and eg such that for £ < Sq, 


lim sup 

K—>cx) 


pW([2,i]:4-) 


_ g-^ logLeA"] _|_ logLeA'J 

.ri +r 2 ri+ r 2 


< Cy/e. 


We hrst give a preliminary Lemma before proving Proposition [S] 
{1, 2, (1, 2)} and m G N, 


Recall (021) and define for j G 


(6.6) R{i,j) '■= sup{TO,m < T^(l) and the j-th locus/loci of the i-th individual 

is/are associated to an allele A at the (to — l)th jump time}, 

the last jump (forwards in time) when the j-th locus/loci of the z-th individual belongs to the A-population 
(with sup0 = —oo). To prove Propositionthe idea is to decompose the event [2, l]^1 according to the 
different possible a-population sizes when the first (backwards in time) recombination between 7V1 and 
N2 occurs. 


(6.7) P^^\[2ATaH) = > i?(f,l) > 0) 

[eK] 

= Y, P^'^(i?(Ll)>0,R(z,2)>i?(i,l),iV„(T|;,,2))=Z) 

= ^ pW(i?(z,2) >i?(z,l),fV,(Tf(,,2)) > 0|i?(z,2) > R{z,l),Na{rY2))=l)- 

1=1 

In the following Lemma, which then gives rise to the proof of Proposition [5l we consider separately the 
two probabilities of the above product: 

Lemma 6.4. There exist three finite constants c, Kq and eg such that for K > Kq, e < eg and I < [eRTj, 


( 6 . 8 ) 

pW(i?(z,2) >R(z,l),fV,(rf(,,2)) 


n 1-(1-5)L^^J ^-(1 -s)^+^ n±r 2 i„,i£^ 

’ ^ s{l + l) 


< 


Cyfe 

llogK 


and 


i-i 


(6.9) P(i)(i?(z, 1) > 0|i?(z, 2) > R(z, 1), N^iTY2)) = 0 - E 


ri 


k=l 


s{k + 1) 




< C^/£. 


Proof of Proposition 01 From Lemma 16.41 and Equation (16.71) we get the existence of a finite c such that 
for K large enough and e small enough. 


pW([2,i]:fj)< 


VeK\-l 

E 


z=i 


r r 2 _ "i +-2 los-Li^ 
Isil + lf 


!— ^ ^ 

I log K\ i s(fc + 1) ^ 


-|- c\/e 


( 6 . 10 ) 
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Rewriting the second term in brackets and applying Lemma IB. 31 with cat / log 


ri/s yields: 




i-i 


E l n 


s ^' fc + 1 

fe=i 


“t" C^J~E ^ 


< 1 - e 




=\/e, 



for K large enough, e small enough and a finite c, whose value can change from line to line and which 
can be chosen independently of 1. We use in the last inequality Condition (ED which claims that 
limsupA:_>oo ^ogK < oo. Including the last inequality in (Ib.lOp gives 


pW([2,i]:4-) < 


L6AJ-1 

E 


1=1 


EH—p-^logh-ffl /"polos' 
s(/ + i) r 



+ c\/e, 


for a finite c, K large enough and e small enough, where we again use (ED which ensures that exponential 
terms are bounded away from zero and infinity in the following sense: 


ri+r2 


- < liminf < limsupe 

C K^oo K=>oo 

for a positive and finite c. Applying again Lemma IB.31 we get: 




\ogleK\ 


< C 


P'^’([2,1]a") < i— -e-^'°«LeAj ^ 

’ Vr’i+r-2 


Cl 


Cl + C2 


^^14^ logiE-K’J ^ 


+ Cy/s. 


The lower bound is obtained in the same way. Notice that it is a little bit more involved as we need to 
use (IB.21) in addition. □ 


The end of this section is devoted to the proof of Lemma 16.41 

Proof of Equation (16.81) . We can decompose the event {R{i,2) > Na{T^^-= 1} according to 

the jump number of the (backwards in time) first recombination. Recall the definition of NR{i)^^') on 
nagefTOl We will use this event with a different initial condition for (NajNo), which will not necessarily 
be ( [nAK\ ,1). It will however still correspond to the absence of any recombination before the end of the 
first phase. We recall conventions (ED and (ED- With the definition of {aik)m in (16.31) we get 

(6.11) {R{i, 2) > R{z, 1), iVa(r|;,,2)) = 0 

= E = l-l,Na{j^) = I, (Ai2)„_i,Vm <m! < il) : {ail2)jn') 

m>l 

< ^ sup {pi"i(n^,l-l)P« ,)(iVR(*)W)}pW(m< J^(l),iV,(T^_i) = /-l,iV„(r^) = 0 

= ^ sup^Jpi"i(n^,/- l)pW ,)(ArR(i)«)}E«[t/^i(l)], 

and the same expression with the infimum on ua S ± 1 for a lower bound. Adding (16.21) and (lA.ll) 
yields, 

P(i)(i?(*, 2) > R{^, 1), iV,(r|'(,,2)) =1) < (1 + ce)^(e-^ '°s ^ [[/^i(l)] 

for a finite c, e small enough and K large enough, where we used that (ri + r 2 )logA' is bounded. We 
similarly get a lower bound and end up the proof of Equation (16.81) by applying (15.101) . □ 

Proof of Equation (ED- We will decompose the event considered here according to the value of Na when 
the first (backwards in time) recombination occurs. Let us denote by C^(l) the jump number of the last 
hitting of fc < [eATj by Na during the first phase, 

(6.12) Cf (1) := sup{m,r,(^ < ff,Na{T^) = fc}. 
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and recall (I5.19L Then we can define the events 

(6.13) NR{l,^^i) := {the first locus of individual i sampled at jump time 

does not recombine from the a- to the ^-population between 0 and t^} 
where ^ G (1), af (1)}. Similarly as in (I6.11L Bayes’ rule leads to: 

(6.14) (i?(i, 1) > 0 I Rii, 2) > i?(*, 1), 7V„(r|',_2)) = 1) 

leKj 

= P^'^(i?(*,l)>0,lV,(r|;,^i))=A:|i?(i,2)>i?(*,l)A(T42))=0, 

k^l 

[sKj 

- PakriA,k-l)p[^^^JNR{l,a,i)))s{k,l), 

k=i ^".4e/f±i 

where for the sake of simplicity we have introduced the notation 

S{k,l):= P('H™<i?(b2),lV,(r^_i)=A:-l,lV,(r^) = A:|lV,(rf(,_2)) = 0- 

m<oo 

The lower bound is obtained by taking the infimum for ua in ± 1 and replacing a by (. To lighten 
the proof, we bound the probability in the brackets for both cr and C in Lemma [A. 11 Equation (IA.2I) . 

First we prove that with a probability close to one the a-population size is bigger when the (backwards 
in time) first recombination occurs than when the second, of locus {i, 1), occurs. Note that by (I5.1()|) and 
Lemma [531 there exists a finite c such that for every I < k < leK\: 

SikJ) < E(i)[C/,^(l)] sup <oc]< 

where we recall that /ie < 1 for e small enough. Hence, recalling (16.1411 and (16.211 . we obtain ior k > I 

L^-^J ]^_l 

^ ^I^(l 2) > i?(*,l),iVa(r4_2)) =0 < cr, Y. ^ < "" 

for a finite c and e small enough, which entails 
P^^^(i?(*,l) > 0 I i?(i,2) > i?(i,l),iVa(T|;,^2)) = 1) 


log AT’ 


L 


logKJ ' 


k=i . 

We therefore can ignore all fc > ^ in the sum in (16.1411 and continue with the case k <1. In this setting, 
we can bound the sum S{k, 1) as follows: 

E«[17f_i(l)] - sup E« i_,^[UYi,k-i{m^kUi^m < S{k,l) < E(i)[t/f_i(l)]. 

Bounding the difference between the two bounds above within Equation (|6.14p then yields 


,i-k 


k=l k=l ^ 

for a finite c by (|6.2II . (15.1011 and Lemma [331 As a consequence. 


Y( sup p«(n^,fc-l)pW ,)(iVi?(/,iT,*))) 5(fc,0-EW[t/f_i(l)] 
±1 


< O 


log A" J ’ 


and thus we can work with E*^^) [[/^^ (1)] ^-s s-n approximation for the sum 5(fc, 1): 

P«(^(bl) > 0 I A(*,2) > A(z,l),iV„(r|',_2)) = 0 

I 

sup pi^j(nA,/c-l)p|;^^ ^)(WA(Z,CT,i)))E(i)[[/;(^i(l)]-kO 


< 


fc=i 


1 


log AT / ’ 
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Reasoning in the same way to get a lower bound and using (16.211 and (IA.2I1 we get the existence of a 
finite c such that for K large enough and e small enough, 


1) > 0|R(i, 2) > R{z, 1), iVf = 0 - 


1-1 

E ri _ii 

k ^ 

k^l 




Applying (15.101) and (IB.21) yields Equation (16.91) . Notice that we have replaced 1/fc by l/(fc + 1). We 
used Condition (HH) to do this. □ 


6.2.2. Event [12, 2]’^'^. Recall the definition of [12, 2]^^^ on pageHOl This section is devoted to the proof 
of the following result: 


Proposition 6. Let i be an individual sampled uniformly at the end of the first phase. There exist two 
finite constants c and Eq such that for £ < Eq, 


lim sup 

K—^oc 


pW([12,2]:4-)-ri 


1 — e' 


+^2 


log [eifj 


-Il^logLeAJ _p-^logBAJ 


n + r2 


ri + r2(l - fA/fa) 


< Cy/s. 


Proof. As the proof is very similar to the proof of Proposition [5] we will be very brief here and only give 
the ingredients. Let us introduce for I < [£K\ the event: 

(6.15) RA{1, ^) := {[12, 2]^. \ i?(b 2) = i?(*, 1) > 0, = 1}- 

Then we can rewrite the probability of [12, 2Y^‘l as follows: 


[eK\ 

(6.16) pW([12,2r/?)= ^ pW(RA(Z,*))pW(i?(*,2)=i?(*,l)>0,iVf(r|;,^i)) = 0. 

1=1 

Apart from the point of recombination, the second probability in the above sum coincides with the 
probability studied in (16.81) and we obtain for e small enough and K large enough. 


(6.17) sup b pW(R(^,2) = R(^,l)>0,iVf(r|;,,l)) = 0- 

^<LeA■J 


ri(l - (1 - s)L®-f^J '-(l-s)'+i) _ri±r 2 i i£|LL 


< c 


log AT’ 


s(/ + l) 

for a finite c, when substituting r 2 by ri in the fraction which mirrors the recombination probability. The 
probability of RA{l,i) is derived in Lemma [A. 11 Inserting ([6.171) and (IA.3I) into (16.161) yields 

z=i 


< Tie 


ri +r2 


^1 +^2 


logteLCJ 


logteAJ _ ^ 


ri+r 2 -fA'^ 2 /fa 


logteAJ _ ^ 


+ C-^E 


ri +r2 ri+r2- fAr 2 /fa 

where we again applied Lemma IB.31 to express the sum in a different way, and used the finiteness of 
limsup^^oc(ri + r 2 )logAr assumed in Condition (11.11) . Reasoning similarly for the lower bound and 
rearranging the terms end the proof of Proposition [S] □ 


6.3. Proof of Proposition [TJ 

Event R2{iY^'>-. By definition and from Lemma 15751 

p(i)(R2(i)(i)) = P(i)(i?(i,2) > 0) - p(^)(R(i,l) > 0) + 

where i?(j, 1) and R(i,2) have been defined in (16.61) . But these probabilities have already been derived 
in [12] Lemma 7.4, and we get: 

p(^)(R2(z)(^)) = (1 - qiq 2 ) - (1 - qi) + Oif(e) = gi(l - ^ 2 ) + Oic(e), 

where Oic(e) satisfies (|4.7I) . 

Event i?l|2(i)(^’®“): By definition (see pagefTUl) 

p(i)(Rl|2(i)(i’S“)) = P(^)([2,l[[4“) + P(i)([12,2[(4“). 
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The result then follows from Propositions [5] and [H 


Event From Definition (I6.15P and Equation (IA.3I) we obtain for K large enough, 

[eK\ 

pW(m2(z)(i)) = ^ (1 - pW(i?A(Z,*)))pW(i?(7,l) = i?(7,2) > 0,iV„(r|(,_2)) = 1 ) 

1=1 
leK] 


= n 


E _ 

e 


-ZAIil jl - (1 - - (1 - S)'+^ „_Ii±I2 wi£^ 


1^1 


ri 


n + 7-2 - fATl/ia 


s{l + \) 

- log Leif J _ g_n±I2 logLeAJ^ 


Oif (f/e), 


where we again used the statement of Lemma IB.3I to substitute the sum, as well as Equation (El. 


Event Erom Lemma [121 

= 1 - P(^)(i?2(i)(i)) - P(i)(i?12(i)(i)) - + 

This ends up the proof of Proposition [1] □ 


7. Second and third phases 
This section is devoted to the proofs of Propositions [5] and [31 


7.1. Proof of Proposition We need to show that two distinct lineages picked uniformly at the end 
of the second phase coalesce or recombine during that phase only with negligible probability. Let us 
recall the definition of the jumps in dm and denote by U^{2) the number of upcrossings of the 
a-population during the second phase: 

(7.1) U^{2) := #{m,Tf < < Tf + = 1}. 

Let us introduce the event C^: 

Cf := {Tf < S^} n {N^{t) > < t < Tf + te\. 

In particular on the event Cf, for +1^ and j G {1,2} 


^ and v’taKNir^)) < 


e'^K 


32 

£4^:2 • 


Then if we recall the definition of NR{i)^^'> on page HD we have for to G N, 

(7.2) P«(7Vi7(i)(2)|c/^(2) = TO,Cf) > 

But for K large enough, log(l — 8(ri + r 2 )/{e'^K)) > —10(ri + r 2 )l{e'^K) and hence 

p(i)(7Vi?(i)(2)|Cf) > (l - p(i)(C/^(2) > iLloglogiL|Cf 

> M - p(i)(C/^(2) > iLloglogiLlCf )Je--. 

According to Condition (11.11) the exponential term is equivalent to 1 when K is large. Moreover, by 
(|3.5I) . is smaller than 2fiaK on the time interval [T^, + tg] with probability close to 1. When this 

property holds, we can bound the birth number (2) by the sum of 2naK iid Poisson random variables 
with parameter fate- The strong law of large numbers then yields 

lim p(^)(D^(2) > a: log log a:IC f) = 0. 

K—^oo 


Applying again (13.51) to get limx^.ooP(C'{^|rg*’ < oo) = 1 finally gives 

lim P{NR{i)^^^\Tf < oo) = 1. 

K^oo 

The coalescence part in Proposition [3] can be proven in the same way. 
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7.2. Proof of Proposition [3l The proof of the asymptotic probability of is the same as for 

(|A.3I) . except that the roles of A and a are exchanged. Hence we do not give more details. Note however 
that it extensively uses Lemma [5.61 Let us now focus on the event and introduce 


{no neutral allele of individual i recombines from the a to the A population}. 

Recall the definitions of and C/^(3) in (14.31) and (15.381) respectively. We decompose the probabilities 

according to the number of upcrossings of Na during the third phase and get in the same way as in (EH), 
for m £ N 


p(3)(iVRH(i)(3)|C/^(3) = A > (l - 


fA{ri+r2)e 


fa{.na-M"efK) ’ 

where we recall that and are the analogs of and (defined in (13.ini ') for the 

process N. But for K large enough and e small enough, 


1 - 


fA{ri+r2)e \ 
fa{na-M"eYK) 
Hence we get for a finite constant c and e small enough: 


log ( 


> -2/a 


(ri + r2)e 
fanlK 


p(3)(7VRH(f)(3)) > 

> 


(l - pl«(t/''(3) > exp ( - 

^ ' ye // V /a?T-a ^ 

[ 7 /^( 3 )] 2fA{ri+r2)^/e\ogK \^,^ ^ ^ 

V K\ogK A f,nl 


where we used Lemma [5^ and that (ri + r 2 ) logi£ is bounded (Condition (11.11) '). 


The proof of the last part of Proposition [3] is very similar to that of Proposition E The key arguments 
are that the expectation of the birth number of a-individuals during the third phase under P^^^ is of 
order K\ogK ('Lemma l5.6L whereas the probability for two neutral alleles associated with an allele a to 
coalesce is of order l/RT^ at each birth of an a-individual ('Lemma 16.IP 


8. Independence of neutral lineages 


This section is dedicated to the proof of Proposition 01 We sample d distinct individuals uniformly at 
the end of the first phase. We recall the definitions of the genealogical events during the first phase on 
page [in] and introduce: 

^(1|2) := ^2 lRi|2(i)(Lso), i?(l) := i?(l|2) + ^ l_Ri2(i)(i) and i?(2) := i?(l) + ^ lfi2(*)(i) 

l<i<d l<i<d l<i<d 

From Proposition [H we know that i?(l), R(2) and i?(l|2) are sufficient to describe the neutral genealogies 
at the end of the first phase up to a probability negligible with respect to one for large K. Let j, k, I be 
three integers such that I < j and j + k < d. We aim at approximating 

(8.1) p(j, k,l): = P(i?(l) = J, R{2) =j + k, R{1\2) = l\T^ < S^) 

= P(R(1) = j\t2 < )P(i?(2) = J + k\T2 < S2,R{1) = j) 

P(i?(l|2) = /|Tf < , i?(l) = J, R{2) = J + k). 


The approximations of the two first probabilities are direct adaptations of Lemma 5.2 and the proof of 
Proposition 2.6 in [TB], pp 1623-1624. More precisely, Lemma 7.3 in [13] which states that with high 
probability two neutral lineages do not coalesce and then recombine (backwards in time) allows us to get 
an equivalent of Lemma 5.2 (with J = 0) in [18] : 


P(i?(l) = j|Tf < ) - Q E[F({1 - Fi)"-^ |Tf < 5f]| < c(^ + s) 


for e small enough, where c is a finite constant, 

F, := P(R(z,l) > 0|((iVA,iV„)(r,f ),n < J'"(l)),Tf < 5f), 
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and R{i, 1) is defined in (16.611 . Then Equations (7.21), (7.23), (7.24) and (7.26) of [19] yield 


lim sup 
K—^oo 


E[E/(1 - < S^] - (1 - 

where qi has been defined in (I1.12L which allows to conclude 


( 8 . 2 ) 


lim sup 

K—^oo 


P(i?(l) = j|rf < S^) - (1 - qiYq^-^^ 


< ce, 


< ce, 


for e small enough where c is a finite constant. 


The derivation of the second probability, P(i?(2) = j + k\TY^ < Sf,R{l) = j), follows the same 
outline. The lineages where A^l does not escape the sweep can be seen as lineages where SL and iVl are 
the same locus and the recombination probability between SL — N1 and N2 is r 2 . This is due to the 
independence of the recombinations between SL and iVl and between iVl and N2. Hence we can rewrite 
the probability as follows: 

P(i7(2) = J + k\T^ < Sf, Ril) = j) = P(i?(2) - i?(l) = fc|Tf < , d - R{1) = d- j). 


We can then directly apply the result (18.21) for the law of i?(l) and get: 


(8.3) 


lim sup 
K—^oo 


P{R{2)=j + k\TY <SY,R{l)=j)- 


d-j 

k 


(1 - g2)"g^' 


k(d-j-k) 


< ce. 


for e small enough where c is a hnite constant and q 2 has been defined in (I1.12P 


The derivation of the last probability in (18.11) is more involved but follows the same spirit. First note 
that we only have to focus on genealogies where A^l escapes the sweep. Hence the derivation of the 
probability comes down to the derivation of P(i?(l|2) = < S^, 7?(1) = j). The idea is to propose an 

alternative construction of the process with the same law and where we add the recombinations between 
A^l and N2 at the end: 

• First we construct a trait population process {Na, Na) with birth and death rates defined in (11.41) 

• Second we ’’add” the recombinations between SL and 7V1: at each birth event we draw a Bernoulli 
variable with parameter ri to decide whether there is a recombination or not. If there is a 
recombination, the parent giving its neutral allele at iVl is chosen with a probability proportional 
to its fertility (/a or /a). 

After this step of the construction we know the genealogies of the d neutral alleles at 1 sampled 
at the end of the sweep. We label {ii, the j sampled neutral alleles at iVl which experience 

a recombination between SL and A^l in their genealogy. 

• Third we ’’add” the recombinations between A^l and N2 sequentially in the lineages where there 
is already a recombination between SL and A^l: we first follow backward in time the lineage of H 
and at each birth event we draw a Bernoulli variable with parameter r 2 to decide whether there 
is a recombination or not, and choose the parent of neutral allele at N2 as in the second step. 
Then we do the same with the lineage of i 2 , and so on until the lineage of ij. 

• Finally we ’’add” the recombinations between iVl and N2 in those lineages which were not marked 
with any recombination between SL and A^l. 

Such a construction generates a process distributed as the original process and facilitates the study of 
the dependencies between lineages According to Lemma [6.31 with high probability there is 

no recombination between SL and A^l after (backwards in time) a coalescence at locus A^l among the d 
sampled individuals. In the same way, there is no coalescence at locus A^I after a recombination between 
SL and A^I in the A-population (this is due to the large number of A-individuals; similar proof as for 
the last probability of Proposition [3| Hence if we introduce 

NC{j) := {there is no coalescence between lineages (ii, at locus A^I}, 

we get: 

P(i?(I|2) = /|Tf < ,A(1) = j) = P(i?(I|2) = < Sf,R{l)=j,NC{j))+o{^^). 
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With the construction of the alternative process we can also define sequentially for 1 < fc < j: 

NC{j, k) := {there is no coalescence between lineages (*i, after completion of the 

process of adding the recombinations between iVl and N2 in the lineage ik]- 
Then, if we introduce for 1 < fe < j and 5 G (0,1} 

(r-ij, = (5} :={there is <5 recombination between iVl and JV2 in the lineage ik}, 
then for (^i, G (0,1}-^ 

= 4,1 < fc < j|rf < = j) = 

n PK = 4|rf < sf, Ril) = J, NC{j),NCU, 1), NC{J, fc - 1)) + o(^) • 

l<k<j 

Indeed, the probability that the event NC{j, fc) is not realized after witnessing the recombinations between 
7V1 and N2 in lineage ik has order logK/K according to Lemma [67^ But for 1 < A: < j, 

(8.4) = 4|Tf < S^,R{l)=j,NC{j),...,NCU,k-l)) 

Pin, = 4, Rji) = J, NCjj ),..., NCU, k - i)|rf < gf) 

P(i?(l) = J, iVC'(j), NCU, fc - < S^) 

Pin,=Sk,R{i)=jK^ <Sf)-Pin, =Sk,Rii)= j,iNCij)n...n NCU, k-i)r\Tf <S^) 
P(i?(i) = j|Tf < ) - P(i?(i) = J, iNCU) n... n NCU, fc - < SY) 

and according to Lemma [6.31 and Coupling p.l4p . there exists a finite c such that for K large enough 
and e small enough, 

PiiNCU) n ... n NCU, fc - YmY < + e). 

As PUi, = Sk, RU) = j\TY < Sf) does not go to 0 when K goes to infinity, we get 


P(r., = 4|Tf < SY,Ril)=j,NCU),...,NCij,k-l)) = Pin, = 4|Tf < ,i?(l) = j)+o(i^+e 

^ P(m|2(zfc)(Pg-)|Tf <^f) / P(m|2(zfc)(i-g°)|rf <^f) x (logK n 

P(i?(zfc,l) > 0|rf < 5f) ^ PiRiik,l)>0\T^^ <SY) ) y K ^ ) 

where we recall the definition of 1) in (16.611 . the definition of i?l|2(ife)*^^’®“^ on page m and we used 
Proposition [T] Adding Equations (18.21) and (18.31) we finally obtain: 


pU,k,l) 

(8.5) 


(") (’■ C) (' - T^i)' 

jij—7yt|—- ®))‘«r'(i -n- ®)' + Ok(s). 


This ends the proof of the independence between genealogies during the first phase. 

The derivation of the asymptotic independence of neutral lineages during the third phase is an easy 
adaptation of Lemma 5.2 and the proof of Proposition 2.6 in m, pp 1623-1624 as with high probability 
two lineages do not coalesce during this phase. □ 


Appendix A. Lemma IA.1I 

Recall the definition of NRii)^^^ on page [ini and Definitions (16.1311 and (16.1511 . Then we have the 
following approximations for large K. 

Lemma A.l. There exist three finite eonstants c, Kq and Eq such that for every K > Kq and e < Bq 

ri -t- r2 


(A.l) 


sup 

nA€l^±l,l<leK\ 


P£,0(^^W^'^)-exp(-^^log 




< C^/£, 
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(A.2) 


sup sup 

rG{C,cr} [eK\ 


i) - exp ( - ^ log I < c^e. 


(A.3) 


sup 

l<leK\ 


- exp ^ log< cy/e. 


Proof. Let us introduce the sigma-algebra generated by the trait population process 

P:=a(iNA,N,){T^),T^<f,^^ 


We use some ideas developed in [18] and extended in |2| towards the two-locus case. The proof, although 
quite technical, can be summarized easily: for {g,b,c,d, f) € the Triangle Inequality and the Mean 
Value Theorem imply 

\ 9 -e~’’\ < | 5 -e"=| + \c-d\ + \d - f\ + \f - b\. 

Hence for every random variables (Xi, ^ 2 ) € IR+ and measurable event C: 


1±I2 




< 


p(i)(C'|X)-e-^i 


X -X 2 


X2-E(i)[X2] + E(i)[X 2] - ll^logi^ 


By taking the expectation and applying Jensen and Cauchy-Schwarz Inequalities, we obtain: 


(A.4) p(i)(C')-e' 


. >-l + '-2 log Le^J 


< e(^) 


p(i)(C'|X) -e-^1 


E(i) 


X 1 -X 2 


+ x/Var(X2) + E(i)[X2]- 


ri + r2 


log 


[eK\ 


Hence the idea is to find the appropriate random variables (Xi,X 2 ) G P+ to get small quantities on the 
right hand side. 

Proof of Equation (lA.ll) .• The first step consists in working conditionally on X, describing this probability 
as a product of conditional probabilities close to one, as well as in deriving a Poisson approximation. To 
this aim, we define for to £ N: 

:= l{rK<fK}l{JVa(T«)-X(T^_j)=i}(pii +pii)(^A, fVa)(T,^_i), 

where we recall the definition of the in Definition 16.11 Notice that Remark [Tjp. [^implies that for 
p G {1,2}, UA G ± I and I < [eATj, 

(A.5) 


(I-c£:)(r'i-hr2)^(^ 


E^. (I) 




{nA,l) k 

{k + I)^ 


) ^ < {ri+r2r{j2 


e 1{\ „C/f (1) 


fc=i 


{nA,l) k 

{k + I)^ 


Then, similarly as in m, we have for ua G ± 1 and I < leK\ 


If we introduce the variable, 


m —1 


rjP2) _ ^ 0(i2)(^)^ 

771=1 

which will play the role of Xi in (IA.4E we get by following the path of Lemma 3.6 in m-- 

00 00 

A-6) ^ 


771 = 1 


771=1 


< — 

“ log^AT 


for K large enough, ua G ± I, Z < [eiEj and a finite c (which can be chosen independently of /), where 
we used Equations (15.1011 (15.Ill) and (IA.5|) , and Condition (II.Ill for the last inequality. Next we introduce 
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an approximation of the random variable namely 


m—1 

which will play the role of in (IA.4I) . For ua G ± 1 and I < [eiFj: 

OO 

(A.9) 


m—1 


. ri+r2 ir^{l-s-{e)y ^ {n + r2) 
< - ^ - 2^ -- < c- 


■s+(e)s?.(e) ^ fc + 1 


I 


for a finite c and e small enough, where we used (IA.5I1 and (15.111) for the first inequality, and (IB.21) for 
the second one. This latter ensures that c can be chosen independently of 1. The expected value of 
can be bounded by using (jA.5|) . (I5.10p and (jB.2l) 


[EiCJ-l 


> (l-Ce)(^l+^2) X! -j-^{ 


1 /l-(l-s)L^'^J-'=-(l-s)'=+i 


— ce 


k^l 


(A.IO) 


. .ri+r 2 [eK\ 

> (1 — ce)-log ■ 


s I log K 

for a finite c and e small enough. For the upper bound we get similarly. 


(A.ll) 


< (l + ce)^^^^log^^. 


The last step consists in bounding the variance of As the calculation of this variance is quite 

involved, we introduce an approximation of namely 


= (12) ._ 


-El 


ri + r2 


LeifJ-l 




= E 

k=N^(0) 


fc + 1 ^ ^ ' 


Equation (16.21) yields (1 — for a finite c and e small enough. Hence 


(A.12) 




( 1 ) 




< ce£f 
— {riA 

< c£(n + 'V' <'»' 

k,k'=l 


(fc + l)(fc' + 1) 


where we used (I5.15P and (IB.31) which ensure that U^(\) is smaller than a geometric random variable 
with parameter > s_(e). Thus it is enough to bound Var|^^^ . Thanks to (15.1211 and 

Condition (HI) we get: 

L-;^-Cov« _,)(t/f(l),C/,^(l)) 


= (ri+r2)" ^ 


k,k'—l 


(fc + i)(fc' + 1 ) 


< 2{n+r2f 


[EifJ-l ,(fe'-fc)/2 , 

~r I 


< C 


\ogleK\ 

k<^=i ^ 

Recalling (IA.12I) and again Condition (11.111 . we finally obtain 


(c + elog[£:A"J). 


(A.13) 


limsupVar^^^ 

iC-s-oo ^ ^ 
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for a finite c independent of I and e small enough. Applying (IA.4I1 with Xi = and X 2 = yields 




.n^iogiitc 




m—1 




We end the proof of Equation (lA.ll) with Inequalities (IA.6E (IA.8|) . (IA.13|) . (lA.lOl) and (lA.llE 


I 


Proof of (IA.2|1 ; There is a supplementary difficulty due to the randomness of 2 ))- In Ih® previous 

case we were interested in an event before the first hitting of [eATj, while in the current case, the 
conditioning on the value of Na{T^^^ 2 )) does not tell us how many times Na has hit this value before. 
This is why we have introduced NR{1, a, i) and NR{1, (, i) in (16.131) . Define for m > 1, 




We again condition on the trait population process and get for ua € if zt 1 and k < I < leK \, 


'( 1 ) 


(A.14) 


p\Z,k)iNRil,a,^m= 


m—l 


and the same expression with a replacing C. We define the corresponding parameters for the Poisson 
approximation as follows: 


Cf(i) 

:= and := 

m—l m—l 

They will play the role of Xi in (IA.4I) . We will show that both can be approximated by: 

Cf(i) 

m—l 

which will play the role of X 2 in (IA.4I) . Recall Definitions (15.8|) . (15.171) and (I5.20|) . On the one hand, for 
G ± 1 and k < \eK \, 

cr(i) 

= P(nl.fe) [ H + l{IVf(r,^)>q)_ 

m—l 
k — 1 

< (1)]^ sup sup ^[nA,k-l)PnA,k,jif] 

j=l'^A&Ip UA^Ip^l 

VeK\ 

(A.16) +E(^)[D,^(1)] ^ sup p-^l[nA,j) sup e[^^^ (1) < 00 ], 

nA6/f±l 


where we used that in the first phase, under the number of excursions below k (resp. above 1) is 
equal to Df{l) (resp. Uf{\) — 1). Applying Inequality (15.101) . Lemma lOl and Equation (16.2E we get 
the existence of a finite c such that for e small enough: 


E 


( 1 ) 

(nA,k) 


h 


(i).+ 


LeAJ \j-l\ 

~(l)l / Re ^ 

ri\ ’] < cri ^ < 


i=i 


c 

logiE’ 
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as /ie € (0,1) for £ small enough and by Condition (II.1|) . On the other hand, by using the same results 
as in (IA.16I) . we get 


E 


( 1 ) 

(UAyk) 




< 


E 


( 1 ) 






C/"(l) 

m=cr,^(l) + l 


< 


cri 


(S 


3 + 1 



< 


c 

logK 


This shows that it is sufficient to use for the Poisson approximation. From (IA.6I1 we deduce that 
this approximation holds true up to terms of order 1/log^ iF. Recalling once again 1221, we see that 
it only remains to calculate the expected value of and to bound its variance. The expectation can 
be approximated in the same way as the expected value of from the previous part in (lA.lOj) and 

(EHJ: 

A comparison of the definitions of in (jA.151) and in (IA.7I) shows that the variance of iy^^ can 
be bounded by the same expression, that is, a constant times e. This ends the proof of Equation 1221). 


Proof of Equation (IA.3I1 It can be done in a similar way as for Equations (lA.lIl and (IA.2E We have the 
following lower and upper bounds: 

(A.18) n [^ - pfAiNA.Na){T^)] < 1 - pW(i?A(Z,*)|J-) < n - P^l\{NA.Na){T^^) . 

m—l m—1 

Once again we aim at deriving a Poisson approximation. As a birth event in the A-population is needed 
to see a recombination within the A-population, bounds on the expected number of jumps will concern 
the process Na and we have to use Lemma [521 D 


Appendix B. Technical results 


This section is dedicated to technical results needed in the proofs. First we recall a well known result 
on the hitting times of birth and death processes which can be found in |18j Lemma 3.1: 


Proposition 7. Let Z = {Zt)t>o ® birth and death proeess with individual birth and death rates b and 
d. For i S Z+, Ti = inf{t > 0, Zt = i} and is the law of Z when Zq = i. Then for {i,j, k) S such 
that j € (i, k), 


(B.l) 


Pj{Tk < Tf) 


1 - {d/by-^ 
1 — 


We also recall Lemma 3.5 in [18] and the first part of Equation (A. 16) in [19] which are used several 
times: 


Lemma B.l. 

• If a > 1 there is a C such that for every N G 


N 


(B.2) 


(B.3) 


1=1 


Ca 


N 


N 


Recall Definition (I5.14E Then for (si,S 2 ) G (Oj 1)^ kind k < [eAIJ, 

(Sl AS2,S1 VS 2 ) 


% 


> Si A S2- 


Einally, we state two technical results. The first one can be proven by using characteristic functions, 
the proof of the second Lemma is given below: 
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Lemma B.2. Let V be a geometric random variable with parameter pi and {G^,i S N) a sequence of 
independent geometric random variables with parameter p 2 , independent ofV. Then the random variable: 


Z := 


i<V 


is geometrically distributed with parameter pip 2 . 


Lemma B.3. Let {cn,N G N) 6e a bounded sequence of IR. Then there exists a finite constant c such 
that 

logA^, 

limsup sup 2_^ —rn-(e ® — 1) 


N—¥oo k<.N 


1^1 


i + i 


Cn 


< c. 


Proof. We prove the Lemma for a sequence {cn,N G N) in R* and extend the result by using the 
convention 


/ l0g-/V (^_^;Qgfc 

V Cat 


-1 


|cN'=0 


= log k. 


The idea is to compare the sum with the integral 




XioeN = 


Cn 


-(glogi 


-I). 


Let I be in {1, N — 1}. Then we have 

rl + l 


yw 1 liogN 

xT^eN dx — 


l + l 


CN 

logfV/,, ,■ Cn I " 

- l{l + 1) — I "- 




Cn 

logiV 

Cn 


I log IV 


(On) 


1 V ‘--'V 

i \ log W 


- 1 - 


log Z + 1 / 
Cn 


{l + l)\ogN 


)■ 


An application of the Taylor-Lagrange formula yields that 

1 V 

1 ^ log N CjV 


(-1)' 


- 1 = 


Cn / Cn 




N o 
log N 


I log N log N V log N 

where x belongs to [0, l/Z], As the sequence {cn,N G N) is bounded, we deduce that there exists a finite 
constant c such that 

;i„„ M g 


<uv 

OW 1 I log IV 

a;i°s" dx — 


l + l 


< 


P' 


This ends up the proof of Lemma IB.3I 
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